Archive

Abusing Gmail as a ghetto dashboard

I'm sure many of us receive regular emails from the same source - by which I mean things like a daily status email from a backup system, or a weekly newsletter from a blogger/journalist we like, etc.

These are a great way of getting notified or kept up to date, but every one of these you receive is also a piece of work you need to do, to keep your Inbox under control. Gmail has a lot of powerful filtering primitives, but as far as I am able to tell, none of them let you manage this kind of email without compromise.

My ideal scenario would be that, for example, my daily backup status email would keep the most recent copy in my Inbox, and automatically archive older ones. Same for newsletters - if I didn't read last week's one, I'm realistically never going to, so once it's more than a couple of weeks stale, just get it out of my Inbox.

Thankfully, Google has an indirect way of making this sort of thing work - Google Apps Script. You can trigger small JavaScript scripts to run every so often, and operate on your data in various Google apps, including Gmail.

So, I quickly wrote this script and it runs every few hours now:

// Configuration data
// Each config should have the following keys:
//  * age_min: maps to 'older_than:' in gmail query terms
//  * age_max: maps to 'newer_than:' in gmail query terms
//  * query: freeform gmail query terms to match against
//
// The age_min/age_max values don't need to exist, given the freeform query value,
// but age_min forces you to think about how frequent the emails are, and age_max
// forces you to not search for every single email tha matches the query
//
// TODO:
//  * Add a per-config flag that skips the archiving if there's only one matching thread (so the most recent matching email always stays in Inbox)
var configs = [
  { age_min:"14d", age_max:"90d", query:"subject:(Benedict's Newsletter)" },
  { age_min:"7d",  age_max:"30d", query:"from:hello@visualping.io subject:gnubert" },
  { age_min:"1d",  age_max:"7d",  query:"subject:(Nightly clone to Thunderbay4 Successfully)" },
  { age_min:"1d",  age_max:"7d",  query:"from:Amazon subject:(Arriving today)" },
  ];

function processInbox() {
  for (var config_key in configs) {
    var config = configs[config_key];
    Logger.log("Processing query: " + config["query"]);

    var threads = GmailApp.search("in:inbox " + config["query"] + " newer_than:" + config["age_max"] + " older_than:" + config["age_min"]);
    for (var thread_key in threads) {
      var thread = threads[thread_key];
      Logger.log("  Archiving: " + thread.getFirstMessageSubject());

      thread.markRead();
      thread.moveToArchive();
    }
  }
}

(apologies for the very basic JavaScript - it's not a language I have any real desire to be good at. Don't @ me).


Fixing an error in Xcode Instruments's Leaks profile

As part of our general effort to try and raise the quality of Hammerspoon, I've been working with @latenitefilms to track down some memory leaks, which can be very easy if you use the Leaks profile in Xcode's "Instruments" tool. I tried this various ways, but I kept running into this error:

Screenshot

After asking on the Apple Developer Forums we got an interesting response from an Apple employee that code signing might be involved. One change later to not do codesigning on Profile builds and Leaks is working again!

So there we go, if you see "An error occurred trying to capture Leaks data" and "Unable to acquire required task port", one thing to check is your code signing setup. I don't know what specifically was wrong, but it's easy enough to just not sign local debug/profile builds most of the time anyway.


AmigaOS 4.1 Final Edition in Qemu

So this is a fun one, some marvellous hackers, including Zoltan Balaton and Sebastien Mauer have been working on Qemu to add support for the Sam460ex motherboard, a PowerPC system from 2010. Of particular interest to me is that this was a board which received an official port of Amiga OS 4, the spiritual successor to AmigaOS, one of my very favourite operating systems.

I'll probably write more about this later, but for now, here is a simple screenshot of the install CD having just booted.

Update: Zoltan has published a page with information about how to get it working, see here

Screenshot


Home networking like a pro - Part 1.1 - Network Storage Redux

Back in this post I described having switched from a Mac Mini + DAS setup, to a Synology and an Intel NUC setup, for my file storage and server needs.

For a time it was good, but I found myself wanting to run more server daemons, and the NUC wasn't really able to keep up. The Synology was plodding along fine, but I made the decision to unify them all into a more beefy Linux machine.

So, I bought an AMD Ryzen 5 1600 CPU and an A320M motherboard, 16GB of RAM and a micro ATX case with 8 drive bays, and set to work. That quickly proved to be a disaster because Linux wasn't stable on the AMD CPU - I hadn't even thought to check, because why wouldn't Linux be stable on an x86_64 CPU in 2018?! With that lesson learned, I swapped out the board/CPU for an Intel i7-8700 and a Z370 motherboard.

I didn't go with FreeNAS as my previous post suggested I might, because ultimately I wanted complete control, so it's a plain Ubuntu Server machine that is fully managed by Ansible playbooks. In retrospect it was a mistake to try and delegate server tasks to an appliance like the Synology, and it was a further mistake to try and deal with that by getting the NUC - I should have just cut my losses and gone straight to a Linux server. Lesson learned!

Instead of getting lost in the weeds of purchase choices and justifications, instead let's look at some of the things I'm doing to the server with Ansible.

First up is root disk encryption - it's nice to know that your data is private when at rest, but a headless machine in a cupboard is not a fun place to be typing a password on boot. Fortunately I have two ways round this - firstly, a KVM (a Lantronix Spider) and secondly, one can add dropbear to an initramfs so you can ssh into the initramfs to enter the password.

Here's the playbook tasks that put dropbear into the initramfs:

- name: Install dropbear-initramfs
  apt:
    name: dropbear-initramfs
    state: present

- name: Install busybox-static
  apt:
    name: busybox-static
    state: present

# This is necessary because of https://bugs.launchpad.net/ubuntu/+source/busybox/+bug/1651818
- name: Add initramfs hook to fix cryptroot-unlock
  copy:
    dest: /etc/initramfs-tools/hooks/zz-busybox-initramfs-fix
    src: dropbear-initramfs/zz-busybox-initramfs-fix
    mode: 0744
    owner: root
    group: root
  notify: update initramfs

- name: Configure dropbear-initramfs
  lineinfile:
    path: /etc/dropbear-initramfs/config
    regexp: 'DROPBEAR_OPTIONS'
    line: 'DROPBEAR_OPTIONS="-p 31337 -s -j -k -I 60"'
  notify: update initramfs

- name: Add dropbear authorized_keys
  copy:
    dest: /etc/dropbear-initramfs/authorized_keys
    src: dropbear-initramfs/dropbear-authorized_keys
    mode: 0600
    owner: root
    group: root
  notify: update initramfs

# The format of the ip= kernel parameter is: <client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
# It comes from https://git.kernel.org/pub/scm/libs/klibc/klibc.git/tree/usr/kinit/ipconfig/README.ipconfig?id=HEAD
- name: Configure boot IP and consoleblanking
  lineinfile:
    path: /etc/default/grub
    regexp: 'GRUB_CMDLINE_LINUX_DEFAULT'
    line: 'GRUB_CMDLINE_LINUX_DEFAULT="ip=10.0.88.11::10.0.88.1:255.255.255.0:gnubert:enp0s31f6:none loglevel=7 consoleblank=0"'
  notify: update grub

While this does rely on some external files, the important one is zz-busybox-initramfs-fix which works around a bug in the busybox build that Ubuntu is currently using. Rather than paste the whole script here, you can see it here.

The last task in the playbook configures Linux to boot with a particular networking config on a particular NIC, so you can ssh in. Once you're in, just run cryptsetup-unlock and your encrypted root is unlocked!

Another interesting thing I'm doing, is using Borg for some backups. It's a pretty clever backup system, and it works over SSH, so I use the following Ansible task to allow a particular SSH key to log in to the server as root, in a way that forces it to use Borg:

- name: Deploy ssh borg access
  authorized_key:
    user: root
    state: present
    key_options: 'command="/usr/bin/borg serve --restrict-to-path /srv/tank/backups/borg",restrict'
    key: "ssh-rsa BLAHBLAH cmsj@foo"

Now on client machines I can run borg create --exclude-caches --compression=zlib -v -p -s ssh://gnuborg:22/srv/tank/backups/borg/foo/backups.borg::cmsj-{utcnow} $HOME and because gnuborg is defined in ~/.ssh/config it will use all the right ssh options (username, hostname and the SSH key created for this purpose):

Host gnuborg
  User root
  Hostname gnubert.local
  IdentityFile ~/.ssh/id_rsa_herborg

Homebridge server monitoring

Homebridge is a great way to expose arbitrary devices to Apple's HomeKit platform. It has helped bridge the Google Nest and Netgear Arlo devices I have in my home, into my iOS devices, since neither of those manufacturers appear to be interested in becoming officially HomeKit compatible.

London has been having a little bit of a heatwave recently and it got me thinking about the Linux server I have running in a closet under the stairs - it has pretty poor airflow available to it, and I didn't know how hot its CPU was getting.

So, by the power of JavaScript, Homebridge and Linux's /sys filesystem, I was able to quickly whip up a plugin for Homebridge that will read an entry from Linux's temperature monitoring interface, and present it to HomeKit. In theory I could use it for sending notifications, but in practice I'm doing that via Grafana - the purpose of getting the information in HomeKit is so I can ask Siri what the server's temperature is.

The configuration is very simple, allowing you to configure one temperature sensor per instance of the plugin (but you could define multiple instances in your Homebridge config.json):

{
    "accessory": "LinuxTemperature",
    "name": "gnubert",
    "sensor_path": "/sys/bus/platform/devices/coretemp.0/hwmon/hwmon0/temp1_input",
    "divisor": 1000
}

(gnubert is the hostname of my server).

Below is a screenshot showing the server's CPU temperature mingling with all of the Nest and Arlo items :)

Screenshot


A little bit of automation of the Trello Mac App

Trello have a Mac app, which I use for work and it struck me this morning that several recurring calendar events I have, which exist to remind me to review a particular board, would be much more pleasant if they contained a link that would open the board directly.

That would be easy if I used the Trello website, but I quite like the app (even though it's really just a browser pretending to be an app), so I went spelunking.

To cut a long story short, the Trello Mac app registers itself as a handler for trello:// URLs, so if you take any trello.com board URL and replace the https:// part with trello:// you can use it as a link in your calendar (or anywhere else) and it will open the board in the app.


Homebridge in Docker, an adventure in networking

Homebridge is a great way of connecting loads of devices that don't support Apple's HomeKit, to your iOS devices. It consists of a daemon that understands the HomeKit Accessory Protocol and many plugins that talk to other devices/services.

My home server is running Ubuntu, so installing Homebridge is fairly trivial, except I run all my services in Docker containers. To make things even more fun, I don't build or manage the containers by hand - the building is done by Docker Hub and the containers are deployed and managed by Ansible.

So far so good, except that for a long time Homebridge used Avahi (an Open Source implementation of Apple's Bonjour host/service discovery protocol) to announce its devices. That presented a small challenge in that I didn't want to have Avahi running only in that container, so I had to bind mount /var/run/avahi-daemon/ into the container.

I recently rebuilt my Homebridge container to pull it up to the latest versions of Homebridge and the plugins I use, but it was no longer announcing devices on my LAN, and there were no mentions of Avahi in its log. After some digging, it turns out that the HomeKit Accessory Protocol (HAP) library that Homebridge uses, now instantiates its own multicast DNS stack rather than using Avahi.

Apart from not actually working, this was great news, I could remove the /var/run bind mount from the container, making things more secure, I just needed to figure out why it wasn't showing up.

The HAP library that Homebridge uses, ends up depending on this library to implement mDNS and it makes a very simple decision about which network interface it should use. In my case, it was choosing the docker0 bridge interface which explicitly isn't connected to the outside world. With no configuration options at the Homebridge level to influence the choice of interface, I had to solve the problem at the Docker network layer.

So, the answer was the following Ansible task to create a Docker network that is attached to my LAN interface (bridge0) and give it a small portion of a reserved segment in the IP subnet I use:

- name: Configure LANbridge network
  docker_network:
    name: lanbridge
    driver: macvlan
    driver_options:
      parent: bridge0
    ipam_options:
      subnet: '10.0.88.0/24'
      gateway: '10.0.88.1'
      iprange: '10.0.88.32/29'

then change the task for the Homebridge container to use this network:

  network_mode: lanbridge

and now Homebridge is up to date, and working, plus I have a Docker network I can use in the future if any other containerised services need to be very close to the LAN.


Receiving remote syslog events with systemd

Systemd includes journald, a fancy replacement for the venerable syslog daemon (and its descendents, syslog-ng and rsyslog).

One interesting, but frustrating, decision by journald's maintainers is that it does not speak the syslog network protocol, so it's unable to receive remote syslog events. Remote syslog is a tremendously useful feature for aggregating log data from many hosts on a network - I've always used it so my network devices can log somewhere I'm likely to look at, but I haven't been able to do that since journald arrived.

While there are many ways to skin this goose, the method I've chosen is a tiny Python daemon that listens on syslog's UDP port (514), does minimal processing of the data and then feeds it into journald via its API, to get the data as rich as possible (since one of journald's strengths is that it can store a lot more metadata about a log entry).

So, here is the source for the daemon, and here is the systemd service file that manages it - note that it runs as an unprivileged user, with the sole privilege escalation of being able to bind to low port numbers (something only root can do normally).

The daemon is certainly not perfect (patches welcome!), but it works. Here is a journald log entry from one of my UniFi access points:

Jun 15 21:28:26 gnubert ("U7PG2,802aa8d48ab3,v3.9.27.8537")[23506]: kernel: [4251792.410000] [wifi1] FWLOG: [58855274] BEACON_EVENT_SWBA_SEND_FAILED (  )

(the more syslog-obsessed among you will notice that I'm setting the identifier to the hostname of the device that sent the message. Internally, the facility is mapped correctly, as is the priority. The text of the message then appears, prepended by its identifier.


Adventures in Lua stack overflows

Hammerspoon is heavily dependent on Lua - it's the true core of the application, so it's unavoidable that we have to interact with Lua's C API in a lot of places. If you've never used it before, Lua's C API is designed to be very simple to integrate with other code, but it also places a fairly high burden on developers to integrate it properly.

One of the ways that Lua remains simple is by being stack based - when you give Lua a C function and make it available to call from Lua code, you have to conform to a particular way of working. The function arguments supplied by the user will be presented to you on a stack, and when your C code has finished its work, the return values must have been pushed onto the stack. Here's an example:

static int someUsefulFunction(lua_State *L) {
    // Fetch our first argument from the stack
    int someNumber = lua_tointeger(L, 1);

    // Fetch our second argument from the stack
    char *someString = lua_tostring(L, 2);

    /* Do some useful work here */

    // Push two return values onto the stack and return 2 so Lua knows how many return values we provided
    lua_pushstring(L, "some result text");
    lua_pushinteger(L, 42);
    return 2;
}

All simple enough.

In this scenario of calling from Lua→C, Lua creates a pseudo-stack for you, so while it's good practice to keep the stack neat and tidy (i.e. remove things from it that you don't need), it's not critical because apart from the return values, the rest of the stack is thrown away. That pseudo-stack only has 20 slots by default though, so if you're pushing a lot of return arguments, or using the stack for other things, you may need to use lua_checkstack() to grow it larger, up to the maximum (2048 slots).

Where things get more interesting, is when you're interacting with the Lua stack without having crossed a Lua→C boundary. For example, maybe you're in a callback function that's been triggered by some event in your C program, and now you need to call a Lua function that the user gave you earlier. This might look something like this:

int globalLuaFunction;
void someCallback(int aValue, char* aString) {
    // Fetch a pointer to the shared Lua state object
    lua_State *L = some_shared_lua_state_provider();

    // Push onto the stack, the Lua function previously supplied by the user, from Lua's global registry
    lua_rawgeti(L, LUA_REGISTRYINDEX, globalLuaFunction);

    // Push the two arguments for the Lua function
    lua_pushinteger(L, aValue);
    lua_pushstring(L, aString);

    // Call the Lua function, telling Lua to expect two arguments
    lua_call(L, 2, 0);

    return;
}

Slightly more complex than the last example, but still manageable. Unfortunately in practice this is a fairly suboptimal implementation of a C→Lua call - storing things in the LUA_REGISTRYINDEX table is fine, but it's often nicer to use multiple tables for different things. The big problem here though is that lua_call() doesn't trap errors. If the Lua code raises an exception, Lua will longjmp to a panic handler and abort() your app.

So, writing this a bit more completely, we get:

int luaCallbackTable;
int globalLuaFunctionRef;
void someCallback(int aValue, char* aString) {
    // Fetch a pointer to the shared Lua state object
    lua_State *L = some_shared_lua_state_provider();

    // Push onto the stack, the table we keep callback references in, from Lua's global registry
    lua_rawgeti(L, LUA_REGISTRYINDEX, luaCallbackTable);

    // Push onto the stack, from our callback reference table, the Lua function previously supplied by the user
    lua_rawgeti(L, -1, globalLuaFunctionRef);

    // Push the two arguments for the Lua function
    lua_pushinteger(L, aValue);
    lua_pushstring(L, aString);

    // Protected call to the Lua function, telling Lua to expect two arguments
    lua_pcall(L, 2, 0, 0);

    return;
}

Ok so this is looking better, we have our own table for neatly storing function references and we'll no longer abort() if the Lua function throws an error.

However, we now have a problem, we're leaking at least one item onto Lua's stack and possibly two. Unlike in the Lua→C case, we are not operating within the safe confines of a pseudo-stack, so anything we leak here will stay permanently on the stack, and at some point that's likely to cause the stack to overflow.

Now here is the kicker - stack overflows are really hard to find by default, you don't typically get a nice error, your program will simply leak stack slots until the stack overflows, far from the place where the leak is happening, then segfault, and your backtraces will have very normal looking Lua API calls in them.

If we were to handle the stack properly, the above could would actually look like this (and note that we've gone from four Lua API calls in the first C→Lua example, to eight here):

int luaCallbackTable;
int globalLuaFunctionRef;
void someCallback(int aValue, char* aString) {
    // Fetch a pointer to the shared Lua state object
    lua_State *L = some_shared_lua_state_provider();

    // Find luaCallbackTable in the Lua registry, and push it onto the stack
    lua_rawgeti(L, LUA_REGISTRYINDEX, luaCallbackTable);

    // Find globalLuaFunctionRef in luaCallbackTable, and push it onto the stack
    lua_rawgeti(L, -1, globalLuaFunctionRef);

    // Remove luaCallbackTable from the stack *THIS WAS LEAKED IN THE ABOVE EXAMPLE*
    lua_remove(L, -2);

    // Push the two arguments for the Lua function
    lua_pushinteger(L, aValue);
    lua_pushstring(L, aString);

    if (lua_pcall(L, 2, 0, 0) == false) {
        // Fetch the Lua error message from the stack
        char *someError = lua_tostring(L, -1);
        printf("ERROR: %s\n", someError);

        // Remove the Lua error message from the stack *THIS WAS LEAKED IN THE ABOVE EXAMPLE*
        lua_pop(L, -1);
    }

    return;
}

Hammerspoon has been having problems like this for the last few months - lots of crash reports that on the surface, look like completely valid code was executing. I have to admit that it took me a lot longer than it should have, to realise that these were Lua stack overflows rather than my initial suspicion (C heap corruption), but we figured it out eventually and have hopefully fixed all of the leaks.

So, how did we discover that the problem was stack overflows, and how did we discover where all of the leaks were without manually auditing all of the places where we make C→Lua transitions (of which there are over 100). The answer to the first question is very simple, by defining LUA_USE_APICHECK when compiling Lua, it will do a little extra work to verify its consistency. Crucially, this includes calling abort() with a helpful message when the stack overflows. We turned this on for developers in March and then released 0.9.61 with it enabled, in early April. It's not normally recommended to have the API checker enabled in production because it calls abort(), but we felt that it was important to get more information about the crashes we couldn't reproduce.

Within a few days we started getting crash reports with the words stack overflow in them (as well as a few other errors, which we were able to fix), but that is only half the battle.

Having discovered that we did definitely have a stack leak somewhere, how did we discover where it was? This did involve a little brute force effort, but thankfully not a full manual audit of all 107 C→Lua call sites. Instead, I wrote two macros:

#define _lua_stackguard_entry(L) int __lua_stackguard_entry=lua_gettop(L);
#define _lua_stackguard_exit(L) assert(__lua_stackguard_entry == lua_gettop(L));

These are very simple to use - you call _lua_stackguard_entry() just after you've obtained a pointer to the Lua state object, and then you call _lua_stackguard_exit() at every point where the function can return after that. It records the size of the stack (lua_gettop()) at the entry point and assert()s that it's the same at the exit point (assert() also calls abort() if something is wrong, so now we would get crash logs with the crash in the actual function where the leak is happening). These entry/exit calls were then added to all 107 call sites 4 days after the 0.9.61 was released and I spent 3 evenings testing or manually verifying every site, before releasing 0.9.65 (0.9.62-0.9.64 fixed some of the other bugs found by the API checker in the mean time).

At the time of writing we're only 24 hours past the release of 0.9.65, but so far things are looking good - no strange Lua segfault crash reports as yet. There was one issue found today where I'd placed a _lua_stackguard_exit() call after a C statement that seemed unimportant, but actually caused an important object to be freed, but that is already fixed and will be included in 0.9.66.

Assuming we have now fixed the problem, after months of head-scratching, and a few weeks of research, testing and coding, it turns out that across the 107 call sites we only had two stack leaks - one was in the code that handles tab completion in Hammerspoon's Console window, and the other was in hs.notify. Hopefully you're all enjoying a more stable Hammerspoon experience, but I think we'll be leaving both the API checker and the stack guard macros enabled since they make it very easy to find/fix these sorts of bugs. I'd rather get a smaller number of crashes sooner, than have more months of head-scratching!

Discuss on Twitter | Discuss on Hacker News


Getting battery data from AirPods in macOS

A recent feature request for Hammerspoon requested that we add support for reading battery information about AirPods (UK US).

Unfortunately because their battery status is quite complex (two earbuds and the case), this information is not reported via the normal IOKit APIs, but with a bit of poking around in the results of class-dump for macOS High Sierra I was able to find some relevant methods/properties on IOBluetoothDevice that let you read information about the battery level of individual AirPods and the case, plus determine which of the buds are currently in an ear!

So, the next release of Hammerspoon should include this code to expose all of this information neatly via hs.battery.privateBluetoothBatteryInfo() 😁