Knowing the Network

When I joined iWeb Hosting 5 years ago we had a smaller number of servers, easily managed manually and checked upon with some bespoke alerting when any server wasn’t responding.

As we’ve grown in server count, staff and the importance of the services we host, it was clear that preempting any problems was better for us and better for our customers, so we went from one end of our network to another adding Nagios probes for just about everything you could think of.

By default Nagios emails our support group with problems, and again when they are acknowledged or resolved, as well as having a ‘Tactical Overview’ to view service status. The email became unmanagable as, in working through our email in order, we would start to look at a problem only to see it had already been fixed or had gone away on its own (dropping beneath a warning threshold, for example).

On the other hand, the tactical overview interface of Nagios leaves a little to be desired, design-wise. We also wanted several other systems we’ve built on for our management tied into one screen; our backup status, packet loss levels and any changes to our firewall rule audit trail.

Jon set about building us a “Defcon” page, to give us a score between 1 (the worst, emergency situation) and 5 (everything’s fine). Pulling from our shared IMAP folders, nmap, Smokeping and Nagios, it uses a simple heuristic to decide how worried we should be, and how urgent it is that we bring the level down.

The simple act of keeping the number on 5 at all times serves as a central motivation for those of us working in the support group. We keep track of how much time is spent on what level, which I’ve documented elsewhere. Now that it was a simple number to keep track off, we exported the whole thing via a REST interface, and put our minds to how it could be visualised.

First up was an Android lunch-time project. The Android API is extremely simple to use, and we got part way through implementing this, before getting distracted by the iPhone HTML extensions, providing us an even quicker time-to-market. All of the support group are on one of these two smartphones; we like our toys.

Defcon iPhone Application Icon Defcon iPhone Screenshot

This is an extremely useful tool, especially for anyone on call to check in on the general health of all of the servers periodically, and provides a general sense of calm wherever you are that something somewhere isn’t broken without you knowing about it.

However, it’s still a very IT centric tool, and it’s just software. Lunch-time projects are about doing something we don’t usually do in our jobs, and electronics has been a frequent topic. Given a large amount of half-finished projects in our wake, we decided to do a project that would be achievable withing two lunch times:

Defcon Breadboard

Now we have an Arduino powered status light! Admittedly, it’s very small, it shoud be considered the prototype for future things. The wiring was done by Ed, most of the code by me (available) and it took less than 5 minutes to hook into our Defcon system due to the very handy python-serial library.

The device just listens at 9600 baud for the characters 1 through 5, and displays them accordingly. And that’s it.

It’s an attention getter when people come up to our cluster of desks; now the challenge is to come up with something you can actually see without cupping your hand over it to shield it from the light :-).

Comments (6)

posted 29 July 2009

categories . Hosting



categories