Home networking like a pro - Part 1.1 - Network Storage Redux

Back in this post I described having switched from a Mac Mini + DAS setup, to a Synology and an Intel NUC setup, for my file storage and server needs.

For a time it was good, but I found myself wanting to run more server daemons, and the NUC wasn't really able to keep up. The Synology was plodding along fine, but I made the decision to unify them all into a more beefy Linux machine.

So, I bought an AMD Ryzen 5 1600 CPU and an A320M motherboard, 16GB of RAM and a micro ATX case with 8 drive bays, and set to work. That quickly proved to be a disaster because Linux wasn't stable on the AMD CPU - I hadn't even thought to check, because why wouldn't Linux be stable on an x86_64 CPU in 2018?! With that lesson learned, I swapped out the board/CPU for an Intel i7-8700 and a Z370 motherboard.

I didn't go with FreeNAS as my previous post suggested I might, because ultimately I wanted complete control, so it's a plain Ubuntu Server machine that is fully managed by Ansible playbooks. In retrospect it was a mistake to try and delegate server tasks to an appliance like the Synology, and it was a further mistake to try and deal with that by getting the NUC - I should have just cut my losses and gone straight to a Linux server. Lesson learned!

Instead of getting lost in the weeds of purchase choices and justifications, instead let's look at some of the things I'm doing to the server with Ansible.

First up is root disk encryption - it's nice to know that your data is private when at rest, but a headless machine in a cupboard is not a fun place to be typing a password on boot. Fortunately I have two ways round this - firstly, a KVM (a Lantronix Spider) and secondly, one can add dropbear to an initramfs so you can ssh into the initramfs to enter the password.

Here's the playbook tasks that put dropbear into the initramfs:

- name: Install dropbear-initramfs
    name: dropbear-initramfs
    state: present

- name: Install busybox-static
    name: busybox-static
    state: present

# This is necessary because of
- name: Add initramfs hook to fix cryptroot-unlock
    dest: /etc/initramfs-tools/hooks/zz-busybox-initramfs-fix
    src: dropbear-initramfs/zz-busybox-initramfs-fix
    mode: 0744
    owner: root
    group: root
  notify: update initramfs

- name: Configure dropbear-initramfs
    path: /etc/dropbear-initramfs/config
    regexp: 'DROPBEAR_OPTIONS'
    line: 'DROPBEAR_OPTIONS="-p 31337 -s -j -k -I 60"'
  notify: update initramfs

- name: Add dropbear authorized_keys
    dest: /etc/dropbear-initramfs/authorized_keys
    src: dropbear-initramfs/dropbear-authorized_keys
    mode: 0600
    owner: root
    group: root
  notify: update initramfs

# The format of the ip= kernel parameter is: <client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
# It comes from
- name: Configure boot IP and consoleblanking
    path: /etc/default/grub
    line: 'GRUB_CMDLINE_LINUX_DEFAULT="ip= loglevel=7 consoleblank=0"'
  notify: update grub

While this does rely on some external files, the important one is zz-busybox-initramfs-fix which works around a bug in the busybox build that Ubuntu is currently using. Rather than paste the whole script here, you can see it here.

The last task in the playbook configures Linux to boot with a particular networking config on a particular NIC, so you can ssh in. Once you're in, just run cryptsetup-unlock and your encrypted root is unlocked!

Another interesting thing I'm doing, is using Borg for some backups. It's a pretty clever backup system, and it works over SSH, so I use the following Ansible task to allow a particular SSH key to log in to the server as root, in a way that forces it to use Borg:

- name: Deploy ssh borg access
    user: root
    state: present
    key_options: 'command="/usr/bin/borg serve --restrict-to-path /srv/tank/backups/borg",restrict'
    key: "ssh-rsa BLAHBLAH cmsj@foo"

Now on client machines I can run borg create --exclude-caches --compression=zlib -v -p -s ssh://gnuborg:22/srv/tank/backups/borg/foo/backups.borg::cmsj-{utcnow} $HOME and because gnuborg is defined in ~/.ssh/config it will use all the right ssh options (username, hostname and the SSH key created for this purpose):

Host gnuborg
  User root
  Hostname gnubert.local
  IdentityFile ~/.ssh/id_rsa_herborg

Homebridge server monitoring

Homebridge is a great way to expose arbitrary devices to Apple's HomeKit platform. It has helped bridge the Google Nest and Netgear Arlo devices I have in my home, into my iOS devices, since neither of those manufacturers appear to be interested in becoming officially HomeKit compatible.

London has been having a little bit of a heatwave recently and it got me thinking about the Linux server I have running in a closet under the stairs - it has pretty poor airflow available to it, and I didn't know how hot its CPU was getting.

So, by the power of JavaScript, Homebridge and Linux's /sys filesystem, I was able to quickly whip up a plugin for Homebridge that will read an entry from Linux's temperature monitoring interface, and present it to HomeKit. In theory I could use it for sending notifications, but in practice I'm doing that via Grafana - the purpose of getting the information in HomeKit is so I can ask Siri what the server's temperature is.

The configuration is very simple, allowing you to configure one temperature sensor per instance of the plugin (but you could define multiple instances in your Homebridge config.json):

    "accessory": "LinuxTemperature",
    "name": "gnubert",
    "sensor_path": "/sys/bus/platform/devices/coretemp.0/hwmon/hwmon0/temp1_input",
    "divisor": 1000

(gnubert is the hostname of my server).

Below is a screenshot showing the server's CPU temperature mingling with all of the Nest and Arlo items :)


A little bit of automation of the Trello Mac App

Trello have a Mac app, which I use for work and it struck me this morning that several recurring calendar events I have, which exist to remind me to review a particular board, would be much more pleasant if they contained a link that would open the board directly.

That would be easy if I used the Trello website, but I quite like the app (even though it's really just a browser pretending to be an app), so I went spelunking.

To cut a long story short, the Trello Mac app registers itself as a handler for trello:// URLs, so if you take any board URL and replace the https:// part with trello:// you can use it as a link in your calendar (or anywhere else) and it will open the board in the app.

Homebridge in Docker, an adventure in networking

Homebridge is a great way of connecting loads of devices that don't support Apple's HomeKit, to your iOS devices. It consists of a daemon that understands the HomeKit Accessory Protocol and many plugins that talk to other devices/services.

My home server is running Ubuntu, so installing Homebridge is fairly trivial, except I run all my services in Docker containers. To make things even more fun, I don't build or manage the containers by hand - the building is done by Docker Hub and the containers are deployed and managed by Ansible.

So far so good, except that for a long time Homebridge used Avahi (an Open Source implementation of Apple's Bonjour host/service discovery protocol) to announce its devices. That presented a small challenge in that I didn't want to have Avahi running only in that container, so I had to bind mount /var/run/avahi-daemon/ into the container.

I recently rebuilt my Homebridge container to pull it up to the latest versions of Homebridge and the plugins I use, but it was no longer announcing devices on my LAN, and there were no mentions of Avahi in its log. After some digging, it turns out that the HomeKit Accessory Protocol (HAP) library that Homebridge uses, now instantiates its own multicast DNS stack rather than using Avahi.

Apart from not actually working, this was great news, I could remove the /var/run bind mount from the container, making things more secure, I just needed to figure out why it wasn't showing up.

The HAP library that Homebridge uses, ends up depending on this library to implement mDNS and it makes a very simple decision about which network interface it should use. In my case, it was choosing the docker0 bridge interface which explicitly isn't connected to the outside world. With no configuration options at the Homebridge level to influence the choice of interface, I had to solve the problem at the Docker network layer.

So, the answer was the following Ansible task to create a Docker network that is attached to my LAN interface (bridge0) and give it a small portion of a reserved segment in the IP subnet I use:

- name: Configure LANbridge network
    name: lanbridge
    driver: macvlan
      parent: bridge0
      subnet: ''
      gateway: ''
      iprange: ''

then change the task for the Homebridge container to use this network:

  network_mode: lanbridge

and now Homebridge is up to date, and working, plus I have a Docker network I can use in the future if any other containerised services need to be very close to the LAN.

Receiving remote syslog events with systemd

Systemd includes journald, a fancy replacement for the venerable syslog daemon (and its descendents, syslog-ng and rsyslog).

One interesting, but frustrating, decision by journald's maintainers is that it does not speak the syslog network protocol, so it's unable to receive remote syslog events. Remote syslog is a tremendously useful feature for aggregating log data from many hosts on a network - I've always used it so my network devices can log somewhere I'm likely to look at, but I haven't been able to do that since journald arrived.

While there are many ways to skin this goose, the method I've chosen is a tiny Python daemon that listens on syslog's UDP port (514), does minimal processing of the data and then feeds it into journald via its API, to get the data as rich as possible (since one of journald's strengths is that it can store a lot more metadata about a log entry).

So, here is the source for the daemon, and here is the systemd service file that manages it - note that it runs as an unprivileged user, with the sole privilege escalation of being able to bind to low port numbers (something only root can do normally).

The daemon is certainly not perfect (patches welcome!), but it works. Here is a journald log entry from one of my UniFi access points:

Jun 15 21:28:26 gnubert ("U7PG2,802aa8d48ab3,v3.9.27.8537")[23506]: kernel: [4251792.410000] [wifi1] FWLOG: [58855274] BEACON_EVENT_SWBA_SEND_FAILED (  )

(the more syslog-obsessed among you will notice that I'm setting the identifier to the hostname of the device that sent the message. Internally, the facility is mapped correctly, as is the priority. The text of the message then appears, prepended by its identifier.

Adventures in Lua stack overflows

Hammerspoon is heavily dependent on Lua - it's the true core of the application, so it's unavoidable that we have to interact with Lua's C API in a lot of places. If you've never used it before, Lua's C API is designed to be very simple to integrate with other code, but it also places a fairly high burden on developers to integrate it properly.

One of the ways that Lua remains simple is by being stack based - when you give Lua a C function and make it available to call from Lua code, you have to conform to a particular way of working. The function arguments supplied by the user will be presented to you on a stack, and when your C code has finished its work, the return values must have been pushed onto the stack. Here's an example:

static int someUsefulFunction(lua_State *L) {
    // Fetch our first argument from the stack
    int someNumber = lua_tointeger(L, 1);

    // Fetch our second argument from the stack
    char *someString = lua_tostring(L, 2);

    /* Do some useful work here */

    // Push two return values onto the stack and return 2 so Lua knows how many return values we provided
    lua_pushstring(L, "some result text");
    lua_pushinteger(L, 42);
    return 2;

All simple enough.

In this scenario of calling from Lua→C, Lua creates a pseudo-stack for you, so while it's good practice to keep the stack neat and tidy (i.e. remove things from it that you don't need), it's not critical because apart from the return values, the rest of the stack is thrown away. That pseudo-stack only has 20 slots by default though, so if you're pushing a lot of return arguments, or using the stack for other things, you may need to use lua_checkstack() to grow it larger, up to the maximum (2048 slots).

Where things get more interesting, is when you're interacting with the Lua stack without having crossed a Lua→C boundary. For example, maybe you're in a callback function that's been triggered by some event in your C program, and now you need to call a Lua function that the user gave you earlier. This might look something like this:

int globalLuaFunction;
void someCallback(int aValue, char* aString) {
    // Fetch a pointer to the shared Lua state object
    lua_State *L = some_shared_lua_state_provider();

    // Push onto the stack, the Lua function previously supplied by the user, from Lua's global registry
    lua_rawgeti(L, LUA_REGISTRYINDEX, globalLuaFunction);

    // Push the two arguments for the Lua function
    lua_pushinteger(L, aValue);
    lua_pushstring(L, aString);

    // Call the Lua function, telling Lua to expect two arguments
    lua_call(L, 2, 0);


Slightly more complex than the last example, but still manageable. Unfortunately in practice this is a fairly suboptimal implementation of a C→Lua call - storing things in the LUA_REGISTRYINDEX table is fine, but it's often nicer to use multiple tables for different things. The big problem here though is that lua_call() doesn't trap errors. If the Lua code raises an exception, Lua will longjmp to a panic handler and abort() your app.

So, writing this a bit more completely, we get:

int luaCallbackTable;
int globalLuaFunctionRef;
void someCallback(int aValue, char* aString) {
    // Fetch a pointer to the shared Lua state object
    lua_State *L = some_shared_lua_state_provider();

    // Push onto the stack, the table we keep callback references in, from Lua's global registry
    lua_rawgeti(L, LUA_REGISTRYINDEX, luaCallbackTable);

    // Push onto the stack, from our callback reference table, the Lua function previously supplied by the user
    lua_rawgeti(L, -1, globalLuaFunctionRef);

    // Push the two arguments for the Lua function
    lua_pushinteger(L, aValue);
    lua_pushstring(L, aString);

    // Protected call to the Lua function, telling Lua to expect two arguments
    lua_pcall(L, 2, 0, 0);


Ok so this is looking better, we have our own table for neatly storing function references and we'll no longer abort() if the Lua function throws an error.

However, we now have a problem, we're leaking at least one item onto Lua's stack and possibly two. Unlike in the Lua→C case, we are not operating within the safe confines of a pseudo-stack, so anything we leak here will stay permanently on the stack, and at some point that's likely to cause the stack to overflow.

Now here is the kicker - stack overflows are really hard to find by default, you don't typically get a nice error, your program will simply leak stack slots until the stack overflows, far from the place where the leak is happening, then segfault, and your backtraces will have very normal looking Lua API calls in them.

If we were to handle the stack properly, the above could would actually look like this (and note that we've gone from four Lua API calls in the first C→Lua example, to eight here):

int luaCallbackTable;
int globalLuaFunctionRef;
void someCallback(int aValue, char* aString) {
    // Fetch a pointer to the shared Lua state object
    lua_State *L = some_shared_lua_state_provider();

    // Find luaCallbackTable in the Lua registry, and push it onto the stack
    lua_rawgeti(L, LUA_REGISTRYINDEX, luaCallbackTable);

    // Find globalLuaFunctionRef in luaCallbackTable, and push it onto the stack
    lua_rawgeti(L, -1, globalLuaFunctionRef);

    // Remove luaCallbackTable from the stack *THIS WAS LEAKED IN THE ABOVE EXAMPLE*
    lua_remove(L, -2);

    // Push the two arguments for the Lua function
    lua_pushinteger(L, aValue);
    lua_pushstring(L, aString);

    if (lua_pcall(L, 2, 0, 0) == false) {
        // Fetch the Lua error message from the stack
        char *someError = lua_tostring(L, -1);
        printf("ERROR: %s\n", someError);

        // Remove the Lua error message from the stack *THIS WAS LEAKED IN THE ABOVE EXAMPLE*
        lua_pop(L, -1);


Hammerspoon has been having problems like this for the last few months - lots of crash reports that on the surface, look like completely valid code was executing. I have to admit that it took me a lot longer than it should have, to realise that these were Lua stack overflows rather than my initial suspicion (C heap corruption), but we figured it out eventually and have hopefully fixed all of the leaks.

So, how did we discover that the problem was stack overflows, and how did we discover where all of the leaks were without manually auditing all of the places where we make C→Lua transitions (of which there are over 100). The answer to the first question is very simple, by defining LUA_USE_APICHECK when compiling Lua, it will do a little extra work to verify its consistency. Crucially, this includes calling abort() with a helpful message when the stack overflows. We turned this on for developers in March and then released 0.9.61 with it enabled, in early April. It's not normally recommended to have the API checker enabled in production because it calls abort(), but we felt that it was important to get more information about the crashes we couldn't reproduce.

Within a few days we started getting crash reports with the words stack overflow in them (as well as a few other errors, which we were able to fix), but that is only half the battle.

Having discovered that we did definitely have a stack leak somewhere, how did we discover where it was? This did involve a little brute force effort, but thankfully not a full manual audit of all 107 C→Lua call sites. Instead, I wrote two macros:

#define _lua_stackguard_entry(L) int __lua_stackguard_entry=lua_gettop(L);
#define _lua_stackguard_exit(L) assert(__lua_stackguard_entry == lua_gettop(L));

These are very simple to use - you call _lua_stackguard_entry() just after you've obtained a pointer to the Lua state object, and then you call _lua_stackguard_exit() at every point where the function can return after that. It records the size of the stack (lua_gettop()) at the entry point and assert()s that it's the same at the exit point (assert() also calls abort() if something is wrong, so now we would get crash logs with the crash in the actual function where the leak is happening). These entry/exit calls were then added to all 107 call sites 4 days after the 0.9.61 was released and I spent 3 evenings testing or manually verifying every site, before releasing 0.9.65 (0.9.62-0.9.64 fixed some of the other bugs found by the API checker in the mean time).

At the time of writing we're only 24 hours past the release of 0.9.65, but so far things are looking good - no strange Lua segfault crash reports as yet. There was one issue found today where I'd placed a _lua_stackguard_exit() call after a C statement that seemed unimportant, but actually caused an important object to be freed, but that is already fixed and will be included in 0.9.66.

Assuming we have now fixed the problem, after months of head-scratching, and a few weeks of research, testing and coding, it turns out that across the 107 call sites we only had two stack leaks - one was in the code that handles tab completion in Hammerspoon's Console window, and the other was in hs.notify. Hopefully you're all enjoying a more stable Hammerspoon experience, but I think we'll be leaving both the API checker and the stack guard macros enabled since they make it very easy to find/fix these sorts of bugs. I'd rather get a smaller number of crashes sooner, than have more months of head-scratching!

Discuss on Twitter | Discuss on Hacker News

Getting battery data from AirPods in macOS

A recent feature request for Hammerspoon requested that we add support for reading battery information about AirPods (UK US).

Unfortunately because their battery status is quite complex (two earbuds and the case), this information is not reported via the normal IOKit APIs, but with a bit of poking around in the results of class-dump for macOS High Sierra I was able to find some relevant methods/properties on IOBluetoothDevice that let you read information about the battery level of individual AirPods and the case, plus determine which of the buds are currently in an ear!

So, the next release of Hammerspoon should include this code to expose all of this information neatly via hs.battery.privateBluetoothBatteryInfo() 😁

Happy 10th Birthday Terminator!

Today marks 10 years since the very first public release of Terminator, a multiplexing terminal emulator project.

I started working on Terminator as a simple way to get 4 terminals to not overlap on my laptop screen. In the following years it grew many features and attracted a userbase I am very proud of.

As much as I would like to, I cannot claim most of the credit for Terminator surviving for a decade - I stepped away from the project a few years ago and handed the reigns over to Stephen Boddy at a very crucial time - gtk2 was becoming ever more obsolete and our work on a gtk3 port was very incomplete. Stephen has driven the project forward and it now has a very good gtk3 version :)

So, thank you to Stephen and everybody else who contributed code/docs/translations/suggestions/bugs/etc over the last 10 years (you can see the most active folk here).

I'd also like to note that according to Ubuntu's Popularity Contest data, Terminator is installed on at least 56000 Ubuntu machines. Debian also has PopCon data, but the numbers there are a little less impressive ;)

This was the first project of mine that reached any kind of significant audience, and is the first project of mine to have achieved a decade of active maintenance, so I am feeling pretty happy today!

USB Type C is awful

Intentionally inflammatory title there, but there are some valid reasons to be annoyed at USB Type C.

Firstly, I have discovered (the hard way) that although there are many cables on sale, the majority of Type C cables do not support even USB 3.0 speeds (so 5Gb/s) let alone USB 3.1gen2 (so 10Gb/s) speeds. They are actually USB 2.0 (so 480Mb/s) cables.

I can understand that some devices with Type C connectors may only need USB 2.0 speeds, but for the cables to not all be USB 3.x seems crazy to me. Even Apple is doing this - the charging cables for MacBooks (12" and Pros) with Type C ports, only support USB 2.0 speeds. If I had to guess, I'd say it's because they wanted the cables to be thin and easily bendable, which full USB 3.1 cables tend not to be.

Secondly, unlike Type A, which marks USB 3.x cables by having blue plastic inside the connectors, there is no way to tell what speed a Type C cable is by looking at it.

Thirdly, and perhaps most importantly, USB Type C is a vastly more powerful beast than previous versions - a modern Type C port can be capable of:

  • 40Gb/s Thunderbolt 3
  • DisplayPort
  • 10Gb/s USB 3.1gen2
  • 100W of (actively negotiated) power delivery in either direction

The DisplayPort "alternate mode" can deliver 4K at 60Hz with USB3.1 at the same time, or 5K with USB2.0 at the same time, or 8K (compressed). When used as Thunderbolt, Type C can carry 5K video as well as the PCI data.

So while one tiny connector can do a whole bunch of really impressive things, the cables are now expected to do vastly more than even USB3.0 Type A cables, let alone USB 2.0, and it seems like that advanced capability isn't currently aligned with the history of USB as being an ultra-cheap, mass market affair.

Various awesome folk have put together a spreadsheet of the chargers/cables they've tested, and it seems like a serious chunk of the Type C cables currently on the market, are junk. This is bad for everyone, especially users, who can buy what looks like the right cable, only to discover that their devices either don't work at all, or work to slowly, or won't charge properly.

This post exists because I needed a USB3.0, three metre, Type A to Type C cable, and I bought one on Amazon, only to discover that it only supported USB2.0. After far too much searching, I eventually found an Anker cable which meets my requirements:

Home networking like a pro - Part 1 - Network Storage


This is part one in a series of posts about some hardware I recommend (or otherwise!) for people who want to bring some semi-professional flair to their home network.

The first topic is storage - specifically, Network Attached Storage.


For the last few years, I was running a Mac Mini with two 3TB drives in a RAID1 array in a LaCie 2big Thunderbolt chassis (US UK), with the Mac running macOS Server to provide file sharing (AFP and SMB), and Time Machine backups for the rest of the network.

This was a very nice solution, in that the Mac was a regular computer, so I could make it do whatever I wanted, but it did have the drawbacks that the Thunderbolt chassis only had two drive bays, and I had trouble getting the Mac to run reliably for months at a time (I ran into GPU related kernel panics, perhaps because it was attached to a TV rather than a monitor).

Around the time I was selecting the Mac/LaCie, most NAS devices in a similar price range were very underpowered ARM devices, and could do little more than share files, but in 2017 almost all NAS devices are much more powerful x86 devices that often have extensive featuresets (e.g. running containers, VMs, hardware accelerated video transcoding, etc.) so I decided it was time to switch.


I ended up choosing a Synology DS916+ (US UK), popped one of the 3TB drives (Western Digital Red (US UK)) out of the LaCie and into the Synology, and set about migrating my data over. I then moved the other drive, and put in two more 3TB drives, all of which are running as a single Synology Hybrid RAID volume with a BTRFS filesystem (note that the Hybrid RAID really seems to just be RAID5).

I configured the Synology to serve files over both AFP and SMB, and enabled its support for Time Machine via AFP. I was also able to connect both of its Ethernet ports to my switch (a ZyXEL GS1900 24 port switch, which I will cover in an upcoming post) and enabled LACP on each end to bond the two connections into a single 2Gb link.

So, how did it work out?

The AFP file sharing is great, and works flawlessly. SMB is a little more complex, because recent versions of macOS tend to enforce encryption on SMB connections, which makes them go much slower, but this can be disabled. I tested Time Machine over SMB, which is officially supported by Synology, but is a very recent addition, and it proved to be unreliable, so that is staying on AFP for now.

Something I was particularly keen on, with the Synology, was that it has an "app store" and one of the available applications is Docker. I was running a few UNIX daemons on the Mac Mini which I wanted to keep, and Docker containers would be perfect for them, however, I discovered that the version of Docker provided by Synology is pretty old and I ran into a strange bug that would cause dockerd to consume all available CPU cycles.

For now, the containers are running on an Intel NUC (which will also be covered in an upcoming post) and the Synology is focussed on file sharing.

Open Source

Synology's NAS products are based on Linux, Samba, netatalk and a variety of other Open Source projects, with their custom management GUI on top. They do publish source, but it's usually a little slow to arrive on the site, and it's not particularly easy (or in some cases even possible) to rebuild in a way that lets you actively customise the device.


Overall, I like the Synology, but I think if I'd known about the Docker issue, I might have built my own machine and put something like FreeNAS on it. More work, less support, but more flexibility.

The recent 5-8 drive Synologies now support running VMs, which seems like a very interesting prospect, since it ought to isolate you from Synology's choices of software versions.