As what as you like

  • A little bit of automation of the Trello Mac App

    Trello have a Mac app, which I use for work and it struck me this morning that several recurring calendar events I have, which exist to remind me to review a particular board, would be much more pleasant if they contained a link that would open the board directly.

    That would be easy if I used the Trello website, but I quite like the app (even though it’s really just a browser pretending to be an app), so I went spelunking.

    To cut a long story short, the Trello Mac app registers itself as a handler for trello:// URLs, so if you take any trello.com board URL and replace the https:// part with trello:// you can use it as a link in your calendar (or anywhere else) and it will open the board in the app.

  • Receiving remote syslog events with systemd

    Systemd includes journald, a fancy replacement for the venerable syslog daemon (and its descendents, syslog-ng and rsyslog).

    One interesting, but frustrating, decision by journald’s maintainers is that it does not speak the syslog network protocol, so it’s unable to receive remote syslog events. Remote syslog is a tremendously useful feature for aggregating log data from many hosts on a network - I’ve always used it so my network devices can log somewhere I’m likely to look at, but I haven’t been able to do that since journald arrived.

    While there are many ways to skin this goose, the method I’ve chosen is a tiny Python daemon that listens on syslog’s UDP port (514), does minimal processing of the data and then feeds it into journald via its API, to get the data as rich as possible (since one of journald’s strengths is that it can store a lot more metadata about a log entry).

    So, here is the source for the daemon, and here is the systemd service file that manages it - note that it runs as an unprivileged user, with the sole privilege escalation of being able to bind to low port numbers (something only root can do normally).

    The daemon is certainly not perfect (patches welcome!), but it works. Here is a journald log entry from one of my UniFi access points:

    Jun 15 21:28:26 gnubert ("U7PG2,802aa8d48ab3,v3.9.27.8537")[23506]: kernel: [4251792.410000] [wifi1] FWLOG: [58855274] BEACON_EVENT_SWBA_SEND_FAILED (  )
    

    (the more syslog-obsessed among you will notice that I’m setting the identifier to the hostname of the device that sent the message. Internally, the facility is mapped correctly, as is the priority. The text of the message then appears, prepended by its identifier.

  • Homebridge in Docker, an adventure in networking

    Homebridge is a great way of connecting loads of devices that don’t support Apple’s HomeKit, to your iOS devices. It consists of a daemon that understands the HomeKit Accessory Protocol and many plugins that talk to other devices/services.

    My home server is running Ubuntu, so installing Homebridge is fairly trivial, except I run all my services in Docker containers. To make things even more fun, I don’t build or manage the containers by hand - the building is done by Docker Hub and the containers are deployed and managed by Ansible.

    So far so good, except that for a long time Homebridge used Avahi (an Open Source implementation of Apple’s Bonjour host/service discovery protocol) to announce its devices. That presented a small challenge in that I didn’t want to have Avahi running only in that container, so I had to bind mount /var/run/avahi-daemon/ into the container.

    I recently rebuilt my Homebridge container to pull it up to the latest versions of Homebridge and the plugins I use, but it was no longer announcing devices on my LAN, and there were no mentions of Avahi in its log. After some digging, it turns out that the HomeKit Accessory Protocol (HAP) library that Homebridge uses, now instantiates its own multicast DNS stack rather than using Avahi.

    Apart from not actually working, this was great news, I could remove the /var/run bind mount from the container, making things more secure, I just needed to figure out why it wasn’t showing up.

    The HAP library that Homebridge uses, ends up depending on this library to implement mDNS and it makes a very simple decision about which network interface it should use. In my case, it was choosing the docker0 bridge interface which explicitly isn’t connected to the outside world. With no configuration options at the Homebridge level to influence the choice of interface, I had to solve the problem at the Docker network layer.

    So, the answer was the following Ansible task to create a Docker network that is attached to my LAN interface (bridge0) and give it a small portion of a reserved segment in the IP subnet I use:

    - name: Configure LANbridge network
      docker_network:
        name: lanbridge
        driver: macvlan
        driver_options:
          parent: bridge0
        ipam_options:
          subnet: '10.0.88.0/24'
          gateway: '10.0.88.1'
          iprange: '10.0.88.32/29'
    

    then change the task for the Homebridge container to use this network:

      network_mode: lanbridge
    

    and now Homebridge is up to date, and working, plus I have a Docker network I can use in the future if any other containerised services need to be very close to the LAN.

  • Adventures in Lua stack overflows

    Hammerspoon is heavily dependent on Lua - it’s the true core of the application, so it’s unavoidable that we have to interact with Lua’s C API in a lot of places. If you’ve never used it before, Lua’s C API is designed to be very simple to integrate with other code, but it also places a fairly high burden on developers to integrate it properly.

    One of the ways that Lua remains simple is by being stack based - when you give Lua a C function and make it available to call from Lua code, you have to conform to a particular way of working. The function arguments supplied by the user will be presented to you on a stack, and when your C code has finished its work, the return values must have been pushed onto the stack. Here’s an example:

    static int someUsefulFunction(lua_State *L) {
        // Fetch our first argument from the stack
        int someNumber = lua_tointeger(L, 1);
    
        // Fetch our second argument from the stack
        char *someString = lua_tostring(L, 2);
    
        /* Do some useful work here */
    
        // Push two return values onto the stack and return 2 so Lua knows how many return values we provided
        lua_pushstring(L, "some result text");
        lua_pushinteger(L, 42);
        return 2;
    }
    

    All simple enough.

    In this scenario of calling from Lua→C, Lua creates a pseudo-stack for you, so while it’s good practice to keep the stack neat and tidy (i.e. remove things from it that you don’t need), it’s not critical because apart from the return values, the rest of the stack is thrown away. That pseudo-stack only has 20 slots by default though, so if you’re pushing a lot of return arguments, or using the stack for other things, you may need to use lua_checkstack() to grow it larger, up to the maximum (2048 slots).

    Where things get more interesting, is when you’re interacting with the Lua stack without having crossed a Lua→C boundary. For example, maybe you’re in a callback function that’s been triggered by some event in your C program, and now you need to call a Lua function that the user gave you earlier. This might look something like this:

    int globalLuaFunction;
    void someCallback(int aValue, char* aString) {
        // Fetch a pointer to the shared Lua state object
        lua_State *L = some_shared_lua_state_provider();
    
        // Push onto the stack, the Lua function previously supplied by the user, from Lua's global registry
        lua_rawgeti(L, LUA_REGISTRYINDEX, globalLuaFunction);
    
        // Push the two arguments for the Lua function
        lua_pushinteger(L, aValue);
        lua_pushstring(L, aString);
    
        // Call the Lua function, telling Lua to expect two arguments
        lua_call(L, 2, 0);
    
        return;
    }
    

    Slightly more complex than the last example, but still manageable. Unfortunately in practice this is a fairly suboptimal implementation of a C→Lua call - storing things in the LUA_REGISTRYINDEX table is fine, but it’s often nicer to use multiple tables for different things. The big problem here though is that lua_call() doesn’t trap errors. If the Lua code raises an exception, Lua will longjmp to a panic handler and abort() your app.

    So, writing this a bit more completely, we get:

    int luaCallbackTable;
    int globalLuaFunctionRef;
    void someCallback(int aValue, char* aString) {
        // Fetch a pointer to the shared Lua state object
        lua_State *L = some_shared_lua_state_provider();
    
        // Push onto the stack, the table we keep callback references in, from Lua's global registry
        lua_rawgeti(L, LUA_REGISTRYINDEX, luaCallbackTable);
    
        // Push onto the stack, from our callback reference table, the Lua function previously supplied by the user
        lua_rawgeti(L, -1, globalLuaFunctionRef);
    
        // Push the two arguments for the Lua function
        lua_pushinteger(L, aValue);
        lua_pushstring(L, aString);
    
        // Protected call to the Lua function, telling Lua to expect two arguments
        lua_pcall(L, 2, 0, 0);
    
        return;
    }
    

    Ok so this is looking better, we have our own table for neatly storing function references and we’ll no longer abort() if the Lua function throws an error.

    However, we now have a problem, we’re leaking at least one item onto Lua’s stack and possibly two. Unlike in the Lua→C case, we are not operating within the safe confines of a pseudo-stack, so anything we leak here will stay permanently on the stack, and at some point that’s likely to cause the stack to overflow.

    Now here is the kicker - stack overflows are really hard to find by default, you don’t typically get a nice error, your program will simply leak stack slots until the stack overflows, far from the place where the leak is happening, then segfault, and your backtraces will have very normal looking Lua API calls in them.

    If we were to handle the stack properly, the above could would actually look like this (and note that we’ve gone from four Lua API calls in the first C→Lua example, to eight here):

    int luaCallbackTable;
    int globalLuaFunctionRef;
    void someCallback(int aValue, char* aString) {
        // Fetch a pointer to the shared Lua state object
        lua_State *L = some_shared_lua_state_provider();
    
        // Find luaCallbackTable in the Lua registry, and push it onto the stack
        lua_rawgeti(L, LUA_REGISTRYINDEX, luaCallbackTable);
    
        // Find globalLuaFunctionRef in luaCallbackTable, and push it onto the stack
        lua_rawgeti(L, -1, globalLuaFunctionRef);
    
        // Remove luaCallbackTable from the stack *THIS WAS LEAKED IN THE ABOVE EXAMPLE*
        lua_remove(L, -2);
    
        // Push the two arguments for the Lua function
        lua_pushinteger(L, aValue);
        lua_pushstring(L, aString);
    
        if (lua_pcall(L, 2, 0, 0) == false) {
            // Fetch the Lua error message from the stack
            char *someError = lua_tostring(L, -1);
            printf("ERROR: %s\n", someError);
    
            // Remove the Lua error message from the stack *THIS WAS LEAKED IN THE ABOVE EXAMPLE*
            lua_pop(L, -1);
        }
    
        return;
    }
    

    Hammerspoon has been having problems like this for the last few months - lots of crash reports that on the surface, look like completely valid code was executing. I have to admit that it took me a lot longer than it should have, to realise that these were Lua stack overflows rather than my initial suspicion (C heap corruption), but we figured it out eventually and have hopefully fixed all of the leaks.

    So, how did we discover that the problem was stack overflows, and how did we discover where all of the leaks were without manually auditing all of the places where we make C→Lua transitions (of which there are over 100). The answer to the first question is very simple, by defining LUA_USE_APICHECK when compiling Lua, it will do a little extra work to verify its consistency. Crucially, this includes calling abort() with a helpful message when the stack overflows. We turned this on for developers in March and then released 0.9.61 with it enabled, in early April. It’s not normally recommended to have the API checker enabled in production because it calls abort(), but we felt that it was important to get more information about the crashes we couldn’t reproduce.

    Within a few days we started getting crash reports with the words stack overflow in them (as well as a few other errors, which we were able to fix), but that is only half the battle.

    Having discovered that we did definitely have a stack leak somewhere, how did we discover where it was? This did involve a little brute force effort, but thankfully not a full manual audit of all 107 C→Lua call sites. Instead, I wrote two macros:

    #define _lua_stackguard_entry(L) int __lua_stackguard_entry=lua_gettop(L);
    #define _lua_stackguard_exit(L) assert(__lua_stackguard_entry == lua_gettop(L));
    

    These are very simple to use - you call _lua_stackguard_entry() just after you’ve obtained a pointer to the Lua state object, and then you call _lua_stackguard_exit() at every point where the function can return after that. It records the size of the stack (lua_gettop()) at the entry point and assert()s that it’s the same at the exit point (assert() also calls abort() if something is wrong, so now we would get crash logs with the crash in the actual function where the leak is happening). These entry/exit calls were then added to all 107 call sites 4 days after the 0.9.61 was released and I spent 3 evenings testing or manually verifying every site, before releasing 0.9.65 (0.9.62-0.9.64 fixed some of the other bugs found by the API checker in the mean time).

    At the time of writing we’re only 24 hours past the release of 0.9.65, but so far things are looking good - no strange Lua segfault crash reports as yet. There was one issue found today where I’d placed a _lua_stackguard_exit() call after a C statement that seemed unimportant, but actually caused an important object to be freed, but that is already fixed and will be included in 0.9.66.

    Assuming we have now fixed the problem, after months of head-scratching, and a few weeks of research, testing and coding, it turns out that across the 107 call sites we only had two stack leaks - one was in the code that handles tab completion in Hammerspoon’s Console window, and the other was in hs.notify. Hopefully you’re all enjoying a more stable Hammerspoon experience, but I think we’ll be leaving both the API checker and the stack guard macros enabled since they make it very easy to find/fix these sorts of bugs. I’d rather get a smaller number of crashes sooner, than have more months of head-scratching!

    Discuss on Twitter Discuss on Hacker News
  • Getting battery data from AirPods in macOS

    A recent feature request for Hammerspoon requested that we add support for reading battery information about AirPods (UK US).

    Unfortunately because their battery status is quite complex (two earbuds and the case), this information is not reported via the normal IOKit APIs, but with a bit of poking around in the results of class-dump for macOS High Sierra I was able to find some relevant methods/properties on IOBluetoothDevice that let you read information about the battery level of individual AirPods and the case, plus determine which of the buds are currently in an ear!

    So, the next release of Hammerspoon should include this code to expose all of this information neatly via hs.battery.privateBluetoothBatteryInfo() 😁