Archive

Funky lcd4linux python module

I've got an LCD on the way, to put in my fileserver and show some status/health info. Rather than wait for the thing to arrive I've gone ahead and started making the config I want with lcd4linux. Since the LCD I'm getting is only 20 characters wide and 4 lines tall, there is not very much space, so I've had to get pretty creative with how I'm displaying information. One thing I wanted was to show the percentage used of the various disks in the machine, but since I have at least 3 mount points, that would either mean scrolling text (ugly) or consuming ¾ of the display (inefficient). It seemed like a much nicer idea to use a single line to represent the space used as a percentage and simple display each of the mounts in turn, but unfortunately lcd4linux's "Evaluator" syntax is not sufficiently complex to be able to implement this directly, so I faced the challenge of either writing a C plugin or passing the functionality off to a Python module. I tend to think that this feature ought to be implemented as a C plugin because it makes it easier to use, but I am unlikely to bother with that because I prefer Python, so I went with a Python module :) The code is on github and the included README.md covers how to use it in an lcd4linux configuration. At some point soon I'll post my lcd4linux configuration - just as soon as I've figured out what to do with the precious 4th line. In the mean time, here is a video of the rotator plugin operating on the third line (the first line being disk activity and the second line being network activity):

Update: I figured out what to do with the fourth line:

That's another python module, this time a port of Chris Applegate's Daily Mail headline generator from JavaScript to Python. Code is on github. As promised, the complete lcd4linux config is available (also on github) here.


Using Caps Lock as a new modifier key in OS X

Update: I have moved this post to its own page, see http://www.tenshu.net/p/fake-hyper-key-for-osx.html for the latest version.


Paperless workflow

Introduction

This is going to be quite a long post, but hopefully interesting to a particular crowd of people. I'm going to tell you all about how I have designed and built a paperless workflow for myself.

Background

This came about some months ago when I needed to find several important documents that were spread through the various organised files that I keep things in. The search took much longer than I would have liked, partly because I am not very efficient at putting paper into the files.

You could suggest that I just get better at doing that, but even if I were to do that, it still only makes me quicker at finding paperwork from the files on my shelf. If I want to really kick things up a gear, the files need to be electronic, accessible from anywhere and powerfully searchable.

The hardware

I started thinking about what I would want. Obviously a scanner was going to be the first pre-requisite of being able to digitise my papers, but what kind to get? After investigating what other people had already said about paperless workflows, it seemed like the ScanSnap range of scanners was a popular choice, but they are quite expensive and it's one more thing on my desk. Instead I decided to go for a multi-function inkjet printer - they have scanners that are good enough, and even though they're bigger than a ScanSnap, I'm also getting a printer in the bargain.

So which one to get? Well that depended on which features were important. My highest priority in this project was that the process of taking a document from paper to my laptop had to be as simple as possible, so in the realms of scanning devices, that means you need one which can automatically scan both sides of the paper.

This turns out to be quite rare in multi-function printers, but after a great deal of research, I found the Epson Stylus Office BX635FWD which has a duplex ADF (Automatic Document Feeder), is very well supported in MacOS X, and is a decent printer (which, for bonus points, supports Apple's AirPrint and Google's Cloud Print standards).

The setup of the Epson was extremely pleasing - it has a little LCD screen and various buttons, which meant that I could power it up and join it to my WiFi network without having to connect it to a computer via USB at all. I then added it as a printer on my laptop (which was easy since the printer was already announcing itself on the WiFi network) and OS X was happy to do both printing and scanning over WiFi.

I then investigated the Epson software for it and found that I didn't have to install a giant heap of drivers and applications, I could pick and choose which things I had. Specifically I was interested in whether I could react to the Scan button being pressed on the printer, even though it was not connected via USB. It turns out that this is indeed possible, via a little application called EEventManager. With that setup to process the scans to my liking (specifically, Colour, 300DPI, assembled into a PDF and saved into a particular temporary directory), the hardware stage of the project was over.

With the ability to turn paper into a PDF with a couple of button presses on the printer itself, I was ready to figure out what to do with it next.

The software

As people with a focus on paperless workflows (such as David Sparks) have rightly pointed out, there are several stages to a paperless workflow - capture, processing and recall. At this point I had the capture stage sorted, so the next one is processing.

When you have a PDF with scanned images inside it, you obviously can't do anything with the text on the pages, it's not computer-readable text, it's a picture, but it turns out that it is possible to tell the PDF what the words are and where they are on the page, which makes the text selectable. So my attention turned to OCR (Optical Character Recognition) software. I didn't engage in a particularly detailed survey because I came across a great deal on Nuance's PDF Converter For Mac product and was so impressed with its trial copy that I snapped up the deal and forged ahead. I hear good things about PDFPen, but I've never tried it.

Automation

Having a directory full of scanned documents and some OCR software is a good place to be, but it's not a great place to be unless you can automate it. Fortunately, OS X has some pretty excellent automation tools.

The magic all happens in a single Automator workflow configured as a Folder Action on the directory that EEventManager is saving the PDFs into:

Workflow

It will find any PDF files in that temporary folder, then loop over them, opening each one in Nuance PDF Converter, run the OCR function then save the PDF. The file is then moved to an archive directory and renamed to a generic date/time based filename. That's it.

That's it

Like I said, that's it. If you've been paying attention, at this point you'll say "but wait, you said there was a third part of a paperless workflow - you need tools to recall the documents later!". You would be right to say that, but the good news is that OS X solves this problem for you with zero additional effort.

As soon as the PDF is saved with the computer-readable text that the OCR function produces, it is indexed by the system's search system - Spotlight. Now all you need to do is hit Cmd-Space and type some keywords, you'll see all your matching documents and be able to get a preview. You can also open the search into a Finder window and see larger previews, change the sorting, edit the search terms, etc.

Future work

While that is it, there are future things I'd like to do - specifically I don't currently have an easy way to pull in attachments from emails, or downloaded PDFs, I have to go and drag them into the archived folder and optionally rename them. However, if you have your email hooked into the system email client (Mail.app) then it is being indexed by Spotlight, including attachments, so there's no immediate hurry to figure out a solution for that.

I do also like the idea of detecting specific keywords (e.g. company names) in the documents and using those to file the PDFs in subdirectories, but I'm not sure if I actually need/want it, so for now I'm sticking with one huge directory of everything.


Photo import workflow

Introduction

Since I'm writing about workflows today, I thought I'd also quickly chuck in a guide to how I get the photos and movies that I've taken with my iPhone, onto my laptop and specifically, imported into Aperture. The Mechanics This requires a few moving parts to produce a final workflow. The high-level process is: 1. Plug iPhone into a USB port 2. Copy photos from the iPhone into a temporary directory, deleting them as they are successfully retrieved 3. Import the photos into Aperture, ensuring they are copied into its library and deleted from the temporary directory

Simple, right? Well yes and no. Retrieval from iPhone


This really ought to be easier than it is, but at least it is possible. Aperture can import photos from devices, but it doesn't seem to offer the ability to delete them from the device after import. That alone makes it not even worth bothering with if you don't want to build up a ton of old photos on your phone. OS X does ship with a tool that can import photos from camera devices and delete the photos afterwards, a tool called AutoImporter.app, but you won't find it without looking hard. It lives at: /System/Library/Image Capture/Support/Application/AutoImporter.app

If you run that tool, you will see no window, just a dock icon and some menus. Go into its Preferences and you will be able to choose a directory to import to, and choose whether or not to delete the files: prefs

Easy! Importing into Aperture


This involves using Automator to build a Folder Action workflow for the directory that AutoImporter is pulling the photos into. All it does is check to see if AutoImporter is still running and if so wait, then launch Aperture and tell it to import everything from that directory into a particular Project, and then delete the source files: Aperture autoimport workflow

That's it!

Really, that's all there is. Now whenever you plug in your iPhone, all of the pictures and movies you've taken recently, will get imported into Aperture for you to process, archive, touch-up, export or whatever else it is that you do with your photos and movies.


A sysadmin talks OpenSSH tips and tricks

My take on more advanced SSH usage

I've seen a few articles recently on sites like HackerNews which claimed to cover some advanced SSH techniques/tricks. They were good articles, but for me (as a systems administrator) didn't get into the really powerful guts of OpenSSH.

So, I figured that I ought to pony up and write about some of the more advanced tricks that I have either used or seen others use. These will most likely be relevant to people who manage tens/hundreds of servers via SSH. Some of them are about actual configuration options for OpenSSH, others are recommendations for ways of working with OpenSSH.

Generate your ~/.ssh/config

This isn't strictly an OpenSSH trick, but it's worth noting. If you have other sources of knowledge about your systems, automation can do a lot of the legwork for you in creating an SSH config. A perfect example here would be if you have some kind of database which knows about all your servers - you can use that to produce a fragment of an SSH config, then download it to your workstation and concatenate it with various other fragments into a final config. If you mix this with distributed version control, your entire team can share a broadly identical SSH config, with allowance for each person to have a personal fragment for their own preferences and personal hosts. I can't recommend this sort of collaborative working enough.

Generate your ~/.ssh/known_hosts

This follows on from the previous item. If you have some kind of database of servers, teach it the SSH host key of each (usually something like /etc/ssh/ssh_host_rsa_key.pub) then you can export a file with the keys and hostnames in the correct format to use as a known_hosts file, e.g.:

server1.company.com 10.0.0.101 ssh-rsa BLAHBLAHCRYPTOMUMBO

You can then associate this with all the relevant hosts by including something like this in your ~/.ssh/config:

Host *.mycompany.com
    UserKnownHostsFile ~/.ssh/generated_known_hosts
    StrictHostKeyChecking yes

This brings some serious advantages:

  • Safer - because you have pre-loaded all of the host keys and specified strict host key checking, SSH will prompt you if you connect to a machine and something has changed.
  • Discoverable - if you have tab completion, your shell will let you explore your infrastructure just by prodding the Tab key.

Keep your private keys, private, private

This seems like it ought to be more obvious than it perhaps is... the private halves of your SSH keys are very privileged things. You should treat them with a great deal of respect. Don't put them on multiple machines (SSH keys are cheap to generate and revoke) and don't back them up.

Know your limits

If you're going to write a config snippet that applies to a lot of hosts you can't match with a wildcard, you may end up with a very long Host line in your ssh config. It's worth remembering that there is a limit to the length of lines: 1024 characters. If you're going to need to exceed that, you will have to just have multiple Host sections with the same options.

Set sane global defaults

HashKnownHosts no
Host *
    GSSAPIAuthentication no
    ForwardAgent no

These are very sane global defaults:

  • Known hosts hashing is good for keeping your hostnames secret from people who obtain your known_hosts file, but is also really very inconvenient as you are also unable to get any useful information out of the file yourself (such as tab completion). If you're still feeling paranoid you might consider tightening the permissions on your known_hosts file as it may be readable by other users on your workstation.
  • GSSAPI is very unlikely to be something you need, it's just slowing things down if it's enabled.
  • Agent forwarding can be tremendously dangerous and should, I think, be actively and passionately discouraged. It ought to be a nice feature, but it requires that you trust remote hosts unequivocally as if they had your private keys, because functionally speaking, they do. They don't actually have the private key material, but any sufficiently privileged process on the remote server can connect back to the SSH agent running on your workstation and request it respond to challenges from an SSH server. If you keep your keys unlocked in an SSH agent, this gives any privileged attacker on a server you are logged into, trivial access to any other machine your keys can SSH into. If you somehow depend on using agent forwarding with Internet facing servers, please re-consider your security model (unless you are able to robustly and accurately argue why your usage is safe, but if that is the case then you don't need to be reading a post like this!)

Notify useful metadata

If you're using a Linux or OSX desktop, you either have something like notify-send(1) or Growl for desktop notifications. You can hook this into your SSH config to display useful metadata to yourself. The easiest way to do this is via the LocalCommand option:

Host *
    PermitLocalCommand yes
    LocalCommand /home/user/bin/ssh-notify.sh %h

This will call the ssh-notify.sh script every time you SSH to a host, passing the hostname you gave, as an argument.  In the script you probably want to ensure you're actually in an interactive terminal and not some kind of backgrounded batch session - this can be done trivially by ensuring that tty -s returns zero. Now the script just needs to go and fetch some metadata about the server you're connecting to (e.g. its physical location, the services that run on it, its hardware specs, etc.) and format them into a command that will display a notification.

Sidestep overzealous key agents

If you have a lot of SSH keys in your ssh-agent (e.g. more than about 5) you may have noticed that SSHing to machines which want a password, or those which you wish to use a specific key that isn't in your agent, can be quite tricky. The reason for this is that OpenSSH currently seems to talk to the agent in preference to obeying command line options (i.e. -i) or config file directives (i.e. IdentityFile or PreferredAuthentications). You can force the behaviour you are asking for with the IdentitiesOnly option, e.g.:

Host server1.company.com
    IdentityFile /some/rarely/used/ssh.key
    IdentitiesOnly yes

(on a command line you would add this with -o IdentitiesOnly=yes)

Match hosts with wildcards

Sometimes you need to talk to a lot of almost identically-named servers. Obviously SSH has a way to make this easier or I wouldn't be mentioning this. For example, if you needed to ssh to a cluster of remote management devices:

Host *.company.com management-rack-??.company.com
    User root
    PreferredAuthentications password

This will match anything ending in .company.com and also anything that starts with management-rack- and then has two characters, followed by .company.com.

Per-host SSH keys

You may have some machines where you have a different key for each machine. By naming them after the fully qualified domain names of the hosts they relate to, you can skip over a more tedious SSH config with something like the following:

Host server-??.company.com
    IdentityFile /some/path/id_rsa-%h

(the %h will be substituted with the FQDN you're SSHing to. The ssh_config man page lists a few other available substitutions.)

Use fake, per-network port forwarding hosts

If you have network management devices which require web access that you normally forward ports for with the -L option, consider constructing a fake host in your SSH config which establishes all of the port forwards you need for that network/datacentre/etc:

Host port-forwards-site1.company.com
    Hostname server1.company.com
    LocalForward 1234 10.0.0.101:1234

This also means that your forwards will be on the same port each time, which makes saving certificates in your browser a reasonable undertaking. All you need to do is ssh port-forwards-site1.company.com (using nifty Tab completion of course!) and you're done. If you don't want it tying up a terminal you can add the options -f and -N to your command line, which will establish the ssh connection in the background.

If you're using programs which support SOCKS (e.g. Firefox and many other desktop Linux apps) you can use the DynamicForward option to send traffic over the SSH connection without having to add LocalForward entries for each port you care about. Used with a browser extension such as FoxyProxy (which lets you configure multiple proxies based on wildcard/regexp URL matches) makes for a very flexible setup.

Use an SSH jump host

Rather than have tens/dozens/hundreds/etc of servers holding their SSH port open to the Internet and being battered with brute force password cracking attempts, you might consider having a single host listening (or a single host per network perhaps), which you can proxy your SSH connections through.

If you do consider something like this, you must resist the temptation to place private keys on the jump host - to do so would utterly defeat the point.

Instead, you can use an old, but very nifty trick that completely hides the jump host from your day-to-day usage:

Host jumphost.company.com
    ProxyCommand none
Host *.company.com
    ProxyCommand ssh jumphost.company.com nc -q0 %h %p

You might wonder what on earth that is doing, but it's really quite simple. The first Host stanza just means we won't use any special commands to connect to the jump host itself. The second Host stanza says that in order to connect to anything ending in .company.com (but excluding jumphost.company.com because it just matched the previous stanza) we will first SSH to the jump host and then use nc(1) (i.e. netcat) to connect to the relevant port (%p) on the host we originally asked for (%h). Your local SSH client now has a session open to the jump host which is acting like it's a socket to the SSH port on the host you wanted to talk to, so it just uses that connection to establish an SSH session with the machine you wanted. Simple!

For those of you lucky enough to be connecting to servers that have OpenSSH 5.4 or newer, you can replace the jump host ProxyCommand with:

ProxyCommand ssh -W %h:%p jumphost.company.com

Re-use existing SSH connections

Some people swear by this trick, but because I'm very close to my servers and have a decent CPU, the setup time for connections doesn't bother me. Folks who are many milliseconds from their servers, or who don't have unquenchable techno-lust for new workstations, may appreciate saving some time when establishing SSH connections.

The idea is that OpenSSH can place connections into the background automatically, and re-use those existing secure channels when you ask for a new ssh(1), scp(1) or sftp(1) connections to hosts you have already spoken to. The configuration I would recommend for this, would be:

Host *
    ControlMaster auto
    ControlPath ~/.ssh/control/%h-%l-%p
    ControlPersist 600

This will do several things:

  • ControlMaster auto will cause OpenSSH to establish the "master" connection sockets as needed, falling back to normal connections if something is wrong.
  • The ControlPath option specifies where the connection sockets will live. Here we are placing them in a directory and giving them filenames that consist of the hostname, login username and port, which ought to be sufficient to uniquely identify each connection. If you need to get more specific, you can place this section near the end of your config and have explicit ControlPath entries in earlier Host stanzas.
  • ControlPersist 600 causes the master connections to die if they are idle for 10 minutes. The default is that they live on as long as your network is connected - if you have hundreds of servers this will add up to an awful lot of ssh(1) processes running on your workstation! Depending on your needs, 10 minutes may not be long enough.

Note: You should make the ~/.ssh/control directory ahead of time and ensure that only your user can access it.

Cope with old/buggy SSH devices

Perhaps you have a bunch of management devices in your infrastructure and some of them are a few years old already. Should you find yourself trying to SSH to them, you might find that your connections don't work very well. Perhaps your SSH client is too new and is offering algorithms their creaky old SSH servers can't abide. You can strip down the long default list of algorithms to this to ones that a particular device supports, e.g.:

Host power-device-1.company.com
    HostkeyAlgorithms ssh-rsa,ssh-dss

That's all folks

Those are the most useful tips and tricks I have for now. Hopefully someone will read this and think "hah! I can do much more advanced stuff than that!" and one-up me :)

Do feel free to comment if you do have something sneaky to add, I'll gladly steal your ideas!


Evil shell genius

Jono Lange was committing acts of great evil in Bash earlier today. I gave him a few pointers and we agreed that it was sufficiently evil that it deserved a blog post. So, if you find yourself wishing you could get pretty desktop notifications when long-running shell commands complete, see his post here for the details.


HP Microserver Remote Access helper

I've only had the Remote Access card installed in my HP Microserver for a few hours and already I am bored of accessing it by first logging into the web UI, then navigating to the right bit of the UI, then clicking a button to download a .jnlp file and then running that with javaws(1). Instead, I have written some Python that will login for you, fetch the file and execute javaws. Much better! You can find the code: here and you'll want to have python-httplib2 installed.


HP Microserver Remote Access Card

I've been using an HP ProLiant Microserver (N36L) as my fileserver at home, for about a year and it's been a really reliable little workhorse. Today I gave it a bit of a spruce up with 8GB of RAM and the Remote Access Card option. Since it came with virtually no documentation, and since I can't find any reference online to anyone else having had the same issue I had, I'm writing this post so Google can help future travellers. When you are installing the card, check in the BIOS's PCI Express options that you have set it to automatically choose the right graphics card to use. I had hard coded it to use the onboard VGA controller. The reason for this is that the RAC card is actually a graphics card, so the BIOS needs to be able to activate it as the primary card. If you don't change this setting, what you will see is the RAC appear to work normally, but its vKVM remote video feature will only ever show you a green screen window, with the words "OUT OF RANGE" in yellow letters. Annoyingly, I thought this was my 1920x1080 monitor confusing things, so it took me longer to fix this than it should have, but there we go.


What is the value of negative feedback on the Internet?

I'm sure we've all been there - you buy something on eBay or from a third party on Amazon, and what you get is either rubbish or not what you asked for. The correct thing to do is to talk to the seller first to try and resolve your problem, and then when everything is said and done, leave feedback rating the overall experience. Several times in the last year I have gone through this process and ended up feeling the need to leave negative feedback. The most obvious case was some bluetooth headphones I'd bought from an eBay seller in China that were so obviously fake that it was hilarious he was even trying to convince me I was doing something wrong. In each of these cases, I have been contacted shortly after the negative feedback to ask if I will remove the feedback in return for a full/partial refund. This has tickled the curious side of my brain into wanting to know what the value of negative feedback is. The obvious way to find out would be to buy items of various different price and then leave negative feedback and see how far the sellers are prepared to go to preserve their reputations. The obvious problem here is that this would be an unethical and unfair way to do science. Perhaps it would be possible to crowd-source anecdotes until they count as data?


Dear Apple

I just woke up here in London and saw the news about Steve Jobs. It's early and, as usual for this time of day, my seven month old son is playing next to me. He has no concept of what my iPhone is, but it holds his fascination like none of his brightly coloured toys do. Only iPad can cause him to abandon his toys and crawl faster. I'd like to thank you all, including Steve, for your work. You have brought technology to ordinary people in a way that delights them without them having to know why. Please keep doing that for a very long time