Firewall rules for HomeKit with HomeAssistant

I’ve recently been playing with the popular open source home automation software Home Assistant. One of the nice features of this platform is that it can export most of the devices it manages as HomeKit devices for easy use from iOS devices.

HomeKit isn’t perfect, it’s a generic management platform so it’ll never be as good at doing thing X compared to a native app from vendor X – it just can’t have all the same parameters and configurability.

Despite this, there’s some compelling features for a household that’s fully on the Apple ecosystem:

  1. It puts all the assorted IoT “stuff” that we have into a single interface. This interface is available on my iPhone, iPad and Watch.
  2. It makes it easy to share to others who probably aren’t so technical they’re running a VPN to your house thanks to the built in tunnelling via Apple TV or Apple HomePod.
  3. The protocol has been opened up by Apple, so that you can now write and use uncertified devices using libraries such as HAP-Python or HAP-NodeJs. This is how it’s now possible for Home Assistant to expose various devices connected via other means to the HomeKit network.

The only thing that’s a bit annoying, is that if you get your firewall rulesets wrong it can be tricky to debug.

I had opened up TCP port 51827 (used by HomeKit) and was able to pair my device successfully, but then had weird issues where the accessories would go into “No Response” state for prolonged periods and only occasionally update with the latest information.

Steve says you’re holding it wrong

The trick to finding this was to do some packet dumping. I ran a packet dump for all traffic from my phone to the server running the Home Assistant app to see what was coming across the wire and could see a pile of mDNS requests that weren’t being answered.

Wireshark never lies

mDNS is a tricky protocol – essentially it’s DNS, but instead of going to a name server for resolution, devices using mDNS send out a multicast packet to the network and wait to see who replies with the answer. Devices implementing mDNS need to listen to these packets and respond where appropriate. It’s most commonly implemented as Bonjour (Apple) and Avahi (Linux).

This means that we need to setup a firewall rule for UDP port 5353 to allow HomeKit clients to find the HomeKit accessory (in this case, Home Assistant). Without it, you get the “No Response” problem when lookups fail.

Why did it work at all without it? Not 100% sure, but I think HAP-Python might occasionally send out it’s own multicast messages advertising itself to iOS devices which allows them to find it for a period of time, but when the TTL expires and they try to re-resolve for connected accessories it’s nowhere to be found.

So the complete set of iptables rules you probably want (or something like them) is:

# mDNS
iptables -A INPUT -p udp -m multiport --dports 5353 -j ACCEPT

# Homekit Protocol
iptables -A INPUT -p tcp -m multiport --dports 51827 -j ACCEPT

# Home Assistant interface
iptables -A INPUT -p tcp -m multiport --dports 8123 -j ACCEPT
Posted in Uncategorized | Tagged , , , | Leave a comment

The bathroom fan debacle

I completed this project a while back and had the images saved up for a blog post – somehow almost a year has gone by since then in a blink of an eye. But anyway, enjoy this delayed update about the exciting world of bathroom fans!

If our house had one deficiency, it would be that the layout of the property has resulted in a rather small and interior only bathroom. Since this bathroom has no direct access to any outside walls for easy ventilation via windows, it instead had a pretty chunky fan already installed when we purchased the property to extract all the shower moisture and other undesirable elements.

By my estimation, the fan would be a good 20 years old and whilst it was doing a good job of extracting air from the bathroom it had two major issues:

  1. Whilst it extracted air successfully from the bathroom, it didn’t send it anywhere useful – instead it was dumping all the moisture directly into the attic space making the attic (and by extension, the whole house) damp.
  2. The bathroom roof features a large skylight which is great for making the room feel light and more spacious than it actually is, but it also acts as a giant shower dome, with steam going up into the skylight space and condensing as the fan is not at the highest point of the roof.

The second issue above really started causing significant issues, particularly since the moisture was damaging the paint and plasterboard and with a small bathroom that makes it almost impossible to extend a ladder and reach the super high ceilings, mould was becoming an issue due to inability to access to tackle the moisture.

 

The situation required remedial work and one day the old fan decided to make the decision easy with the fan cover getting brittle enough that it suddenly fell out of the roof into the bath one evening without warning with a loud crash.

Unscheduled rapid disassembly

 

For this replacement project, I started with tackling the ventilation problem to make sure the air would actually get extracted out of the house, rather than into the attic (side note: pretty sure venting bathrooms into attics is now forbidden under the building codes for any new installs).

To do this, I brought a 150mm Simx/Manrose Thru Roof Cowl Kit – these can be found at consumer hardware stores and it’s a kit that consists of a plastic tube, the metal cowl ontop and a suitable rubber mounting boot and assorted hardware.

Going through the roof was the only option on this property. Not only is the bathroom in the middle of the house, even if I ran a long ducting run to the nearest walls, there’s no soffits or other tidy location to install the vent without damaging the character of the property.

Who cares about the iPhone, *this* is the unboxing you’ve been holding out for.

Installing this was… fun. I purchased my new most-favourite-thing-ever, a reciprocating saw (also called a sabre saw) in order to cut a hole in the roof. This tool has gone on to work hard in a large number of other projects and I consider it an extremely good investment given it’s versatility.

Cut all the things!

One of the quirks of our property being approximately 100 years old is that the roof is sarked with timbers which makes it extremely solid – a proper house, from a more refined age. They stopped building houses like this a long time ago, I think the whole non-sustainable forestry part become a slight issue…

In this situation the solid build both helped and hindered – trying to cut through corrugated iron is a lot easier when there’s not 20mm of native hardwood underneath it as the saw has the habit of picking up the iron and vibrating it like crazy whilst trying to cut the timbers.

But the upside is that it meant I could screw the roof vent directly into the timber and be assured of a tight and long-lasting fit, whereas if I had only floating iron sheets it would have been a lot tricker to get a really tight fit… If this had been the case like in a more modern property, I’d have probably brought some plywood sheets and run them across the rafters to provide a solid surface for screwing the vent into for a similar effect.

You can get an idea of how solid the house is from this photo inside the attic. In this photo the vent has been installed and ducting attached.

Once I cut the roof hole, I sealed the vent kit rubber boot thoroughly with silicone, with a layer between the boot and the iron sheets, as well as around the edge of the boot and the plastic tube.

The rubber boot has a metal strip allowing it to be bent to fit the corrugated iron snugly. Once the screws went in, it’s a very tight seal and the silicone makes 100% sure it’s sealed.

Note the diagonal placement – this is intentional since it ensures you don’t end up with a pool of water collecting at the top of the boot, which could happen if you placed it square.

The new vent next to the bathroom skylight. You can see the interior intake through the skylight.

You’ll notice that I’ve cut the hole overlapping two separate sheets. This… isn’t ideal, I’d much rather have cut into a single sheet (easier to seal, less to go wrong as the sheets expand and contract) but I was trying to utilise an existing hole that was already cut in the sarking timbers for what must have been a small vent (maybe an overflow pipe?) in the past to minimise the amount of cutting needed.

This placement also caused another small annoyance for me, which is that the vent is not quite 100% straight and you can notice it sometimes. It’s not an issue for water ingress in the rain, but it annoys me not being 100% perfect. That being said, I’ve had trades install other vents in the property (future project post coming!) and they didn’t get it properly straight either, so I feel somewhat vindicated.

Slightly wonky vent will annoy me everyday forever more…. but if I’ve learnt anything re DIY, don’t fuck with anything that ain’t broke.

 

To connect the fan to the vent, I brought some insulated ducting. It’s important to use insulated ducting for this, rather than the cheaper uninsulated stuff, since bathroom air is warm and moist – if the ducting is not insulated, you are likely to suffer condensation in the duct as the air cools when it transits through the cooler attic temperatures. By keeping it as warm as possible on it’s way out you can avoid this.

I was worried about some condensation occurring in the vent tube itself – I can’t insulate outside of the roof after all – but this fortunately hasn’t been a problem. Additionally there was some concern that the roof cowl wouldn’t be enough to keep out rain in Wellington winds, but this also hasn’t been an issue – it could be a different situation in more exposed areas of the city and there are cowl fittings that are more suited for unfriendly wind conditions if this was the case.

 

Next I had to sort out a new fan. I was tempted to keep the existing fan given it’s strong air throughput and the motor still running fine, but I couldn’t easily find a replacement front for it, and the back was completely exposed so there was no easy way to fit the ducting to the back of the fan.

I ended up buying a 250mm “high pressure” fan, but unfortunately this didn’t work out well for me. Despite being described as high pressure, I can confirm that these consumer-available bathroom fans are absolutely useless and not worth buying if you want anything more than a faint breeze.

I had it in place for 1 week, during which time we not only quickly noticed it struggling to remove the steam from the room, but that it was also slowing filling up with water that was condensing in the fan as it struggled the clear the room.

First iteration. Note the side exiting fan which required twisted ducting – not ideal. You can also see the hole in the sarking is larger than needed – that’s because there was a pre-existing hole that I took advantage of which was longer than I needed.

Unfortunately given how much of a complete failure this fan had been, I had to remove it and find an alternative.

 

The solution was a 150mm pro-series Simix inline fan capable of delivering 166l/s (597m3/hr) air throughput. For comparison, the previous attempted fan was maybe 69l/s (250m3/hr) at best and even that seems dubious given how poorly it performed.

Nimbus is a big fan of this model.

I couldn’t find these at any consumer hardware store and ended up taking advantage of a friend with an electrical trade account who was able to place an order with the supplier for me.

The one key difference with this solution vs my previous attempt is that the fan is inline, which means the bathroom needed a vent installed, with ducting from the intake vent to the fan, then ducting to the outlet vent. This does have some noise advantages since you can place the fan motor further away from the intake.

Second iteration. Mounting it on framing timber is a little overkill but I had framing timber and not plywood handy. Because of the solid roof, I found it easier to mount to the underside of the roof, rather than building a platform on the attic floor.

This solution worked much better – the fan is able to extract a considerable amount of air and whilst a bit noisy due to high RPM and small diameter, it does a good job of clearing the bathroom throughout the shower and not letting it build up too much moisture.

When researching this project, it was suggested  that you shouldn’t be clever and mix duct sizes (eg a 200mm fan into a 150mm vent), so I kept the same spec throughout – if I had gone for a larger roof vent and duct in the beginning, I might have gone with something larger to get more throughput but also larger fans tend to be quieter (as a general rule).

The other big positive of this approach is that I was able to solve the skylight condensing problem by putting the new intake vent directly into the side of the skylight wall to rapidly extract out the air.

This is working extremely well – whilst we have the existing damage from past moisture build up which will require remedial work (complete repaint, maybe some new plastering), since putting in this new vent we’ve had very little moisture build up since the fan keeps the air moving in the skylight preventing condensation. And since heat rises, all the steam from the shower naturally gravitates towards the vent anyway.

This photo illustrates the issue with the placement of the old vs new fan – the old one did nothing for all the stream that wafted up into the skylight space, vs the new one keeping that space clear.

I found it really hard to find an intake vent that wasn’t horribly ugly and plasticky, so ended up paying a bit of a premium for an aluminium model. It ended up being a right pain to install so maybe I should have gone for a cheap nasty plastic 150mm that would have been simple to fit, but I think it was worth it and looks good (once I fix all the paintwork anyway). I even managed to get it dead centre which given I was cutting it out from inside the attic due to inability to get ladder high enough in the bathroom was a pretty good outcome.

 

Anyway despite the challenges, I’m pretty happy with this setup now. It’s working reliably, I’ve checked the ducting a few times to make sure there’s been no moisture build up or water ingress from outside (both good!) so I’m expecting this solution to last for a long time.

I still need to fix the plasterboard and paint job in the bathroom. It’s kind of stuck pending access to a more flexible ladder/indoor scaffold type system just due to the height of the bathroom roof and very limited placement points for ladders. Still it annoys me daily so it’ll get fixed sooner or later when I get really sick of it looking so bad.

Rough cost for the project was around $500-600 NZD in parts – the most expensive bits being the fan motor, and then the roof vent kit – a wall vent solution would be a fair bit cheaper.

If I was doing this project again from scratch, I’d probably have done a few things differently.

  • I’d almost certainly have considered getting the bigger model and going for a 200mm fan able to shift 272l/s (980m3/hr) of air. The current model is good, but I’d almost have enjoyed the bathroom being like a vacuum cleaner for maximum dryness. And the larger fan size should be a bit quieter.
  • I utilised the existing hardwired appliance circuit as a straight replacement of one fan for another, but it would have been good to get a timer fan installed, to keep it running for a given time period following the fan being switched off. This is something I might end up getting an electrician to install for me in future anyway, but I might have been able to save some money getting a model of fan with the timer circuit build in, vs having to now buy a timer module for the circuit.
  • I don’t love the ducting install. I ended up with two 90 degree bends which were unavoidable due to the original hole being intended for a fan directly below the hole, but I’d have preferred an almost straight run to ensure minimal workload for the fan (maybe some noise reduction too?). This could have been easily accomplished by installing the fan further up the roof line and running the duct straight from the interior vent, through the fan, then up and out the roof. But it wasn’t worth sealing one hole and cutting another to fix.
  • If I ever build a house, I’m making sure the bathrooms have at least one exterior wall to make ventilation so much simpler!
  • I put in all the vent bolts (hex head) using an automotive socket set by hand. This works totally fine but just takes ages, so an impact driver would have been a nice addition to the tool kit. That being said, I’ve since done other hex head screws using a socket adaptor drill bit which allows me to use the cordless drill to drive hex head screws, although admittedly lacking the high torque of a proper impact driver.
Posted in Uncategorized | Tagged , , , , | 2 Comments

Deep Dive into ECS

I spent a fair bit of time in 2017 re-architecting the carnival.io platform onto Amazon ECS, including working to handle some tricky autoscaling challenges brought on by the nature of the sudden high-load spikes experienced when we deliver push messages to customers.

I’ve now summed up these learnings into a deep dive talk on the Amazon ECS architecture that I presented at the Wellington AWS Users Group on February 12th 2018.

This talk explains what container orchestration is, some key fundamentals about ECS, how we’ve tackled CI/CD with ECS and going into details around some of the unique autoscaling challenges caused by millions of cellphones sending home telemetry all at once.

This talk is technical, but includes content appropriate for both beginners wanting to know how ECS functions and experts wanting to see just what can be accomplished with the platform.

 

Posted in Uncategorized | Tagged , , , , , , , | Leave a comment

Puppet Autosigning & Cloud Recommendations

I was over in Sydney this week attending linux.conf.au 2018 and made a short presentation at the Sysadmin miniconf regarding deploying Puppet in cloud environments.

The majority of this talk covers the Puppet autosigning process which is a big potential security headache if misconfigured. If you’re deploying Puppet (or even some other config management system) into the cloud, I recommend checking this one out (~15mins) and making sure your own setup doesn’t have any issues.

 

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Firebase FCM upstream with Swift on iOS

I’ve been learning a bit of Swift lately in order to write an iOS app for my alarm system. I’m not very good at it yet, but figured I’d write some notes to help anyone else playing with the murky world of Firebase Cloud Messaging/FCM and iOS.

One of the key parts of the design is that I wanted the alarm app and the alarm server to communicate directly with each other without needing public facing endpoints, rather than the conventional design when the app interacts via an HTTP API.

The intention of this design is that it means I can dump all the alarm software onto a small embedded computer and as long as that computer has outbound internet access, it just works™️. No headaches about discovering the endpoint of the service and much more simplified security as there’s no public-facing web server.

Given I need to deliver push notifications to the app, I implemented Google Firebase Cloud Messaging (FCM) – formerly GCM – for push delivery to both iOS and Android apps.

Whilst FCM is commonly used for pushing to devices, it also supports pushing messages back upstream to the server from the device. In order to do this, the server must be implemented as an XMPP server and the FCM SDK be embedded into the app.

The server was reasonably straight forwards, I’ve written a small Java daemon that uses a reference XMPP client implementation and wraps some additional logic to work with HowAlarming.

The client side was a bit more tricky. Google has some docs covering how to implement upstream messaging in the iOS app, but I had a few issues to solve that weren’t clearly detailed there.

 

Handling failure of FCM upstream message delivery

Firstly, it’s important to have some logic in place to handle/report back if a message can not be sent upstream – otherwise you have no way to tell if it’s worked. To do this in swift, I added a notification observer for .MessagingSendError which is thrown by the FCM SDK if it’s unable to send upstream.

class AppDelegate: UIResponder, UIApplicationDelegate, MessagingDelegate {

 func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey: Any]?) -> Bool {
   ...
   // Trigger if we fail to send a message upstream for any reason.
   NotificationCenter.default.addObserver(self, selector: #selector(onMessagingUpstreamFailure(_:)), name: .MessagingSendError, object: nil)
   ...
 }

 @objc
 func onMessagingUpstreamFailure(_ notification: Notification) {
   // FCM tends not to give us any kind of useful message here, but
   // at least we now know it failed for when we start debugging it.
   print("A failure occurred when attempting to send a message upstream via FCM")
 }
}

Unfortunately I’m yet to see a useful error code back from FCM in response to any failures to send message upstream – seem to just get back a 501 error to anything that has gone wrong which isn’t overly helpful… especially since in web programming land, any 5xx series error implies it’s the remote server’s fault rather than the client’s.

 

Getting the GCM Sender ID

In order to send messages upstream, you need the GCM Sender ID. This is available in the GoogleService-Info.plist file that is included in the app build, but I couldn’t figure out a way to extract this easily from the FCM SDK. There probably is a better/nice way of doing this, but the following hack works:

// Here we are extracting out the GCM SENDER ID from the Google
// plist file. There used to be an easy way to get this with GCM, but
// it's non-obvious with FCM so here's a hacky approach instead.
if let path = Bundle.main.path(forResource: "GoogleService-Info", ofType: "plist") {
  let dictRoot = NSDictionary(contentsOfFile: path)
  if let dict = dictRoot {
    if let gcmSenderId = dict["GCM_SENDER_ID"] as? String {
       self.gcmSenderId = gcmSenderId // make available on AppDelegate to whole app
    }
  }
}

And yes, although we’re all about FCM now, this part hasn’t been rebranded from the old GCM product, so enjoy having yet another acronym in your app.

 

Ensuring the FCM direct channel is established

Finally the biggest cause I had for upstream message delivery failing, is that I was often trying to send an upstream message before FCM had finished establishing the direct channel.

This happens for you automatically by the SDK whenever the app is loaded into foreground, provided that you have shouldEstablishDirectChannel set to true. This can take up to several seconds after application launch for it to actually complete – which means if you try to send upstream too early, the connection isn’t ready, and your send fails with an obscure 501 error.

The best solution I found was to use an observer to listen to .MessagingConnectionStateChanged which is triggered whenever the FCM direct channel connects or disconnects. By listening to this notification, you know when FCM is ready and capable of delivering upstream messages.

An additional bonus of this observer, is that when it indicates the FCM direct channel is established, by that time the FCM token for the device is available to your app to use if needed.

So my approach is to:

  1. Setup FCM with shouldEstablishDirectChannel set to true (otherwise you won’t be going upstream at all!).
  2. Setup an observer on .MessagingConnectionStateChanged
  3. When triggered, use Messaging.messaging().isDirectChannelEstablished to see if we have a connection ready for us to use.
  4. If so, pull the FCM token (device token) and the GCM Sender ID and retain in AppDelegate for other parts of the app to use at any point.
  5. Dispatch the message to upstream with whatever you want in messageData.

My implementation looks a bit like this:

class AppDelegate: UIResponder, UIApplicationDelegate, MessagingDelegate {

 func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey: Any]?) -> Bool {
  ...
  // Configure FCM and other Firebase APIs with a single call.
  FirebaseApp.configure()

  // Setup FCM messaging
  Messaging.messaging().delegate = self
  Messaging.messaging().shouldEstablishDirectChannel = true

  // Trigger when FCM establishes it's direct connection. We want to know this to avoid race conditions where we
  // try to post upstream messages before the direct connection is ready... which kind of sucks.
  NotificationCenter.default.addObserver(self, selector: #selector(onMessagingDirectChannelStateChanged(_:)), name: .MessagingConnectionStateChanged, object: nil)
  ...
 }

 @objc
 func onMessagingDirectChannelStateChanged(_ notification: Notification) {
  // This is our own function listen for the direct connection to be established.
  print("Is FCM Direct Channel Established: \(Messaging.messaging().isDirectChannelEstablished)")

  if (Messaging.messaging().isDirectChannelEstablished) {
   // Set the FCM token. Given that a direct channel has been established, it kind of implies that this
   // must be available to us..
   if self.registrationToken == nil {
    if let fcmToken = Messaging.messaging().fcmToken {
     self.registrationToken = fcmToken
     print("Firebase registration token: \(fcmToken)")
    }
   }

   // Here we are extracting out the GCM SENDER ID from the Google PList file. There used to be an easy way
   // to get this with GCM, but it's non-obvious with FCM so we're just going to read the plist file.
   if let path = Bundle.main.path(forResource: "GoogleService-Info", ofType: "plist") {
    let dictRoot = NSDictionary(contentsOfFile: path)
     if let dict = dictRoot {
      if let gcmSenderId = dict["GCM_SENDER_ID"] as? String {
       self.gcmSenderID = gcmSenderId
     }
    }
   }

  // Send an upstream message
  let messageId = ProcessInfo().globallyUniqueString
  let messageData: [String: String] = [
   "registration_token": self.registrationToken!, // In my use case, I want to know which device sent us the message
   "marco": "polo"
  ]
  let messageTo: String = self.gcmSenderID! + "@gcm.googleapis.com"
  let ttl: Int64 = 0 // Seconds. 0 means "do immediately or throw away"

  print("Sending message to FCM server: \(messageTo)")

  Messaging.messaging().sendMessage(messageData, to: messageTo, withMessageID: messageId, timeToLive: ttl)
  }
 }

 ...
}

For a full FCM downstream and upstream implementation example, you can take a look at the HowAlarming iOS app source code on Github and if you need a server reference, take a look at the HowAlarming GCM server in Java.

 

Learnings

It’s been an interesting exercise – I wouldn’t particularly recommend this architecture for anyone building real world apps, the main headaches I ran into were:

  1. FCM SDK just seems a bit buggy. I had a lot of trouble with the GCM SDK and the move to FCM did improve stuff a bit, but there’s still a number of issues that occur from time to time. For example: occasionally a FCM Direct Channel isn’t established for no clear reason until the app is terminated and restarted.
  2. Needing to do things like making sure FCM Direct Channel is ready before sending upstream messages should probably be handled transparently by the SDK rather than by the app developer.
  3. I have still yet to get background code execution on notifications working properly. I get the push notification without a problem, but seem to be unable to trigger my app to execute code even with content-available == 1 . Maybe a bug in my code, or FCM might be complicating the mix in some way, vs using pure APNS. Probably my code.
  4. It’s tricky using FCM messages alone to populate the app data, occasionally have issues such as messages arriving out of order, not arriving at all, or occasionally ending up with duplicates. This requires the app code to process, sort and re-populate the table view controller which isn’t a lot of fun. I suspect it would be a lot easier to simply re-populate the view controller on load from an HTTP endpoint and simply use FCM messages to trigger refreshes of the data if the user taps on a notification.

So my view for other projects in future would be to use FCM purely for server->app message delivery (ie: “tell the user there’s a reason to open the app”) and then rely entirely on a classic app client and HTTP API model for all further interactions back to the server.

Posted in Uncategorized | Tagged , , , , , , , , , , | Leave a comment

MongoDB document depth headache

We ran into a weird problem recently where we were unable to sync a replica set running MongoDB 3.4 when adding new members to the replica set.

The sync would begin, but at some point during the sync it would always fail with:

[replication-0] collection clone for 'database.collection' failed due to Overflow:
While cloning collection 'database.collection' there was an error
'While querying collection 'database.collection' there was an error 
'BSONObj exceeded maximum nested object depth: 200''

(For extra annoyance the sync would continue with syncing all the other databases and collections on the replica set, before then only realising it had actually failed earlier at the very end of the sync and then restarting the sync from the beginning again).

 

The error means that one or more documents has a max depth over 200. This could be a chain of objects, or a chain of arrays in a document – a mistake that isn’t too tricky to cause with a buggy loop or ORM.

But how is it possible that this document could be in the database in the first case? Surely it should have been refused at time of insert? Well the nested document limit size and enforcement has changed at various times in past versions and a long-lived database such as ours from early MongoDB 2.x days may have had these bad documents inserted before the max depth limit was enforced and only now when we try to use the document do the limits become a problem.

In our case the document was old, but didn’t have any issues syncing back on Mongo 3.0 but now failed with Mongo 3.4.

Finding the document is tricky – the replication process helpfully does not log the document ID, so you can’t go and purge it from the collection to resolve the issue.

With input from my skilled colleagues with better Mongo skills than I, we figured out three queries that allowed us to identify the bad documents.

1. This query finds any documents that have a long chain of nested objects inside them.

db.collection.find({ $where: function() { return tojsononeline(this).indexOf("} } } } } } } } }") != -1 } })

2. This query finds any documents that have a long chain of nested arrays. This was the specific issue in our case and this query successfully identified all the bad documents.

db.collection.find({ $where: function() { return tojsononeline(this).indexOf("] ] ] ] ] ] ]") != -1 } })

3. And if you get really stuck, you can find any bad document (for whatever reason) by reading the document and then re-writing it back out to another collection. This ensures the document gets all the limits applied at write time and will identify their ID, regardless of the specific reason for them being refused.

db.collection.find({}).forEach(function(d) { print(d["_id"]); db.new_collection.insert(d) });

Note that all of these queries tend to be performance impacting since you’re asking your database to read every single document. And the last one, copying collections, could take considerable time and space to complete.

I recommend restoring the replica set to a test system and performing the operation there where you know it’s not going to impact production if you have any data of notable size.

Once you find your bad document, you can display it with:

db.collection.find({ _id: ObjectId("54492129902178d6f600004f") });

And delete it entirely (assuming nothing important in it!) with:

db.collection.deleteOne({ _id: ObjectId("54492129902178d6f600004f") });
Posted in Uncategorized | Tagged , , | 1 Comment

MacOS High Sierra unable to free disk space

I recently ran out of disk space on my iMac. After migrating a considerable amount of undesirable data to either the file server or /dev/null, I found that despite my efforts, the amount of free disk space had not increased.

I was worried it was an issue with the new APFS file system introduced to all SSD-using Macs as of High Sierra, but in this case it turns out the issue is that Time Machine retains local snapshots on disk, in addition to the full backup history that is retained on the network time machine device.

Apple state that they automatically remove local snapshots when disk space is low, but their definition of low is apparently only 5GB of free space remaining – not really much free working space in 2017 when you might want scratch space of 22GB for 1 hour of 4k 30FPS footage.

On older MacOS releases, it was possible to disable the local snapshot feature entirely, this doesn’t seem to be the case with High Sierra – but it does appear to be possible to force an immediate purge of local snapshots with the following command:

sudo tmutil thinLocalSnapshots / 10000000000 4

For example;

Back into the time vortex with you, filthy snapshots!

Note that this snapshot usage is not visible as a distinct item in the Disk Utility or Storage Management application.

In my case, all the snapshots appeared to be within the last 24 hours, so if I hadn’t urgently needed the disk space, I suspect the local snapshots would have flushed themselves after a 24 hour period restoring considerable disk space.

The fact this isn’t an opt-in user-accessible feature is a shame. It adds convenience for a user of not having to get physical access to the backup drive or time capsule-like-thingy in order to restore data, but any users of systems with SSD-only storage are likely to be a bit precious about how every GB is used and there’s almost no transparency about how much space is being consumed. Especially annoying when you urgently need more space and are stuck wondering why nothing is freeing up room…

Posted in Uncategorized | Tagged , , , , , | 3 Comments

Access Route53 private zones cross account

Using Route53 private zones can be a great way to maintain a private internal zone for your server infrastructure. However sometimes you may need to share this zone with another VPC in the same or in another AWS account.

The first situation is easy – a Route53 zone can be associated with any number of VPCs within a single AWS account using the AWS console.

The second is more tricky but is doable by creating a VPC association authorization request in the account with the zone, then accepting it from the other account.

# Run against the account with the zone to be shared.
aws route53 \
create-vpc-association-authorization \
--hosted-zone-id abc123 \
--vpc VPCRegion=us-east-1,VPCId=vpc-xyz123 

# Run against the account that needs access to the private zone.
aws route53 \
associate-vpc-with-hosted-zone \
--hosted-zone-id abc123 \
--vpc VPCRegion=us-east-1,VPCId=vpc-xyz123 \
--comment "Example Internal DNS Zone"

# List authori(z|s)ations once done
aws route53 \
list-vpc-association-authorizations \
--hosted-zone-id abc123

This doesn’t even require VPC peering since it works behind the scenes, with the associated zone now being resolvable using the default VPC DNS server on each zone that has been associated.

Note that the one catch is that this does not help you if you’re linking to a non-AWS VPC environment, such as an on-prem data centre via IPSec VPN or Direct Connect. Even though you can route to the VPC and systems inside it, the AWS DNS resolver for the VPC will refuse requests from IP space outside of the VPC itself.

So the only option is have an EC2 instance acting as a DNS forwarder inside the VPC, which is reachable from the linked data centre and yet since it’s in the VPC, can use the resolver.

Posted in Uncategorized | Tagged , , , , | 3 Comments

FailberryPi – Diverse carrier links for your home data center

Given the amount of internet connected things I now rely on at home, I’ve been considering redundant internet links for a while. And thanks to the affordability of 3G/4G connectivity, it’s easier than ever to have a completely diverse carrier at extremely low cost.

I’m using 2degrees which has a data SIM sharing service that allows me to have up to 5 other devices sharing the one data plan, so it literally costs me nothing to have the additional connection available 24×7.

My requirements were to:

  1. Handle the loss of the wired internet connection.
  2. Ensure that I can always VPN into the house network.
  3. Ensure that the security cameras can always upload footage to AWS S3.
  4. Ensure that the IoT house alarm can always dispatch events and alerts.

I ended up building three distinct components to build a failover solution that supports flipping between my wired (VDSL) and wireless (3G) connection:

  1. A small embedded GNU/Linux system that can bridge a USB 3G modem and an ethernet connection, with smarts to recover from various faults (like crashed 3G stick).
  2. A dynamic DNS solution, since my mobile telco certainly isn’t going to give me a static IP address, but I need inbound traffic.
  3. A DNS failover solution so I can redirect inbound requests (eg home VPN) to the currently active endpoint automatically when a failure has occurred.

 

The Hardware

I considered using a Mikrotik with USB for the 3G link – it is a supported feature, but I decided to avoid this route since I would need to replace my perfectly fine router for one with a USB port, plus I know from experience that USB 3G modems are fickle beasts that would be likely to need some scripting to workaround various issues.

For the same reason I excluded some 3G/4G router products available that take a USB modem and then provide ethernet or WiFi. I’m very dubious about how fault tolerant these products are (or how secure if consumer routers are anything to go by).

I started off the project using a very old embedded GNU/Linux board and 3G USB modem I had in the spare parts box, but unfortunately whilst I did eventually recycle this hardware into a working setup, the old embedded hardware had a very poor USB controller and was throttling my 3G connection to around 512kbps. :-(

Initial approach – Not a bomb, actually an ancient Gumstix Verdex with 3G modem.

So I started again, this time using the very popular Raspberry Pi 2B hardware as the base for my setup. This is actually the first time I’ve played with a Raspberry Pi and I actually really enjoyed the experience.

The requirements for the router are extremely low – move packets between two interfaces, dial a modem and run some scripts. It actually feels wasteful using a whole Raspberry Pi with it’s whole 1GB of RAM and Quad Core ARM CPU, but they’re so accessible and cost affordable, it’s not worth the time messing around with any more obscure embedded boards.

Pie ingredients

It took me all of 5 mins to assemble and boot an OS on this thing and have a full Debian install ready for work. For this speed and convenience I’ll happily pay a small price premium for the Raspberry Pi than some other random embedded vendors with much more painful install and upgrade processes.

Baked!

It’s important to get a good power supply – 3G/4G modems tend to consume the full 500mW available to them. I kept getting under voltage warnings (the red light on the Pi turns off) with a 2.1 Amp phone charger I was using. Ended up buying the official 2.5 Amp Raspberry Pi charger, which powers the Raspberry Pi 2 + the 3G modem perfectly.

I brought the smallest (& cheapest) class 10 Micro SDHC card possible – 16GB. Of course this is way more than you actually need for a router, 4GB would have been plenty.

The ZTE MF180 USB 3G modem I used is a tricky beast on Linux, thanks to the kernel seeing it as a SCSI CDROM drive initially which masks the USB modem features. Whilst Linux has usb_modeswitch shipping as standard these days, I decided to completely disable the SCSI CDROM feature as per this blog post to avoid the issue entirely.

 

The Software

The Raspberry Pi I was given (thanks Calcinite! ?) had a faulty GPU so the HDMI didn’t work. Fortunately Raspberry Pi doesn’t let such a small issue like no display hold it back – it’s trivial to flash an image to the SD card from another machine and boot a headless installation.

  1. Download Raspbian minimal/lite (Debian + Raspberry Pi goodness).
  2. Installed image to the SD card using the very awesome Etcher.io (think “safe dd” for noobs) as per the install instructions using my iMac.
  3. Enable SSH as per instructions: “SSH can be enabled by placing a file named ssh, without any extension, onto the boot partition of the SD card. When the Pi boots, it looks for the ssh file. If it is found, SSH is enabled, and the file is deleted. The content of the file does not matter: it could contain text, or nothing at all.”
  4. Login with username “pi” and password “raspberry”.
  5. Change the password immediately before you put it online!
  6. Upgrade the Pi and enable automated updates in future with:
    apt-get update && apt-get -y upgrade
    apt-get install -y unattended-upgrades

The rest is somewhat specific to your setup, but my process was roughly:

  1. Install apps needed – wvdial for establishing the 3G connection via AT commands + PPP, iptables-persistent for firewalling, libusb-dev for building hub-ctrl and jq for parsing JSON responses.
    apt-get install -y wvdial iptables-persistent libusb-dev jq
  2. Configure a firewall. This is very specific to your network, but you’ll want both ipv4 and ipv6 rules in /etc/iptables/rules.* Generally you’d want something like:
    1. Masquerade (NAT) traffic going out of the ppp+ and eth0 interfaces.
    2. Permit forwarding traffic between the interfaces.
    3. Permit traffic in on port 9000 for the health check server.
  3. Enable IP forwarding (net.ipv4.ip_forward=1) in /etc/sysctl.conf.
  4. Build hub-ctrl. This utility allows the power cycling of the USB controller + attached devices in the Raspberry Pi, which is extremely useful if your 3G modem has terrible firmware (like mine) and sometimes crashes hard.
    wget https://raw.githubusercontent.com/codazoda/hub-ctrl.c/master/hub-ctrl.c
    gcc -o hub-ctrl hub-ctrl.c -lusb
  5. Build pinghttpserver. This is a tiny C-based webserver which we can use to check if the Raspberry Pi is up (Can’t use ICMP as detailed further on).
    wget -O pinghttpserver.c https://gist.githubusercontent.com/jethrocarr/c56cecbf111af8c29791f89a2c30b978/raw/9c53f66fbed609d09652b8c4ceff0194876c05a3/gistfile1.txt
    make pinghttpserver
  6. Configure /etc/wvdial.conf. This will vary by the type of 3G/4G modem and also the ISP in use. One key value is the APN that you use. In my case, I had to set it to “direct” to ensure I got a real public IP address with no firewalling, instead of getting a CGNAT IP, or a public IP with inbound firewalling enabled. This will vary by carrier!
    [Dialer Defaults]
    Init1 = ATZ
    Init2 = ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0
    Init3 = AT+CGDCONT=1,"IP","direct"
    Stupid Mode = 1
    Modem Type = Analog Modem
    Phone = *99#
    Modem = /dev/ttyUSB2
    Username = { }
    Password = { }
    New PPPD = yes
  7. Edit /etc/ppp/peers/wvdial to enable “defaultroute” and “replacedefaultroute” – we want the wireless connection to always be the default gateway when connected!
  8. Create a launcher script and (once tested) call it from /etc/rc.local at boot. This will start up the 3G connection at boot and launch various processes we need. (this could be nicer and be a collection of systemd services, but damnit I was lazy ok?). It also handles reboots and powercycling USB if problems are encountered for (an attempt) at automated recovery.
    wget -O 3g_failover_launcher.sh https://gist.githubusercontent.com/jethrocarr/a5dae9fe8523cf74d30a065d77d74876/raw/57b5860a9b3f6a048b02b245f3628ee60ea766dc/3g_failover_launcher.sh

At this point, you should be left with a Raspberry Pi that gets a DHCP lease on it’s eth0, dials up a connection with your wireless telco and routes all traffic it receives on eth0 to the ppp interface.

In my case, I setup my Mikrotik router to have a default GW route to the Raspberry Pi and the ability to failover based on distance weightings. If the wired connection drops, the Mikrotik will shovel packets at the Raspberry Pi, which will happily NAT them to the internet.

 

The DNS Failover

The work above got me an outbound failover solution, but it’s no good for inbound traffic without a failover DNS record that flips between the wired and wireless connections for the VPN to target.

Because the wireless link would be getting a dynamic IP addresses, the first requirement was a dynamic DNS service. There are various companies around offering free or commercial products for this, but I chose to use a solution built around AWS Lambda that can be granted access directly to my DNS hosted inside Route53.

AWS have a nice reference dynamic DNS solution available here that I ended up using (Sadly not using the Serverless framework so there’s a bit more point+click setup than I’d like, but hey).

Once configured and a small client script installed on the Raspberry Pi, I had reliable dynamic DNS running.

The last bit we need is DNS failover. The solution I used was the native AWS Route53 Health Check feature, where AWS adjust a DNS record based on the health of monitored endpoints.

I setup a CNAME with the wired connection as the “primary” and the wireless connection as the “secondary”. The DNS CNAME will always point to the primary/wired connection, unless it’s health check fails, in which case the CNAME will point to the secondary/wireless connection. If both fail, it fails-safe to the primary.

A small webserver (pinghttpserver) that we built earlier is used to measure connectivity – the Route53 Health Check feature unfortunately lacks support for ICMP connectivity tests hence the need to write a tiny server for checking accessibility.

This webserver runs on the Raspberry Pi, but I do a dst port NAT to it on both the wired and wireless connections. If the Pi should crash, the connection will always fail safe to the primary/wired connection since both health checks will fail at once.

There is a degree of flexibility to the Route53 health checks. You can use a CloudWatch alarm instead of the HTTP check if desired. In my case, I’m using a Lambda I wrote called “lambda-ping” (creative I know) which is a Lambda that does HTTP “pings” to remote endpoints and recording the response code, plus latency. (Annoyingly it’s not possible to do ICMP pings with Lambda either, since the container that Lambda execute inside of lack the CAP_NET_RAW kernel capability, hence the “ping-like” behaviour).

lambda-ping in action

I use this, since it gives me information for more than just my failover internet links (eg my blog, sites, etc) and acts like my Pingdom / Newrelic Synthetics alternative.

 

Final Result

After setting it all up and testing, I’ve installed the Raspberry Pi into the comms cabinet. I was a bit worried that all the metal casing would create a faraday cage, but it seems to be working OK (I also placed it so that the 3G modem sticks out of the cabinet surrounds).

So far so good, but if I get spotty performance or other issues I might need to consider locating the FailberryPi elsewhere where it can get clear access to the cell towers without disruption (maybe sealed ABS box on the roof?). For my use case, it doesn’t need to be ultra fast (otherwise I’d spend some $ and upgrade to 4G), but it does need to be somewhat consistent and reliable.

Installed on a shelf in the comms cabinet, along side the main Mikrotik router and the VDSL modem

So far it’s working well – the outbound failover could do with some tweaking to better handle partial failures (eg VDSL link up, but no international transit), but the failover for the inbound works extremely well.

Few remaining considerations/recommendations for anyone considering a setup like this:

  1. If using the one telco for both the wireless and the wired connection, you’re still at risk of a common fault taking out both services since most ISPs will share infrastructure at some level – eg the international gateway. Use a completely different provider for each service.
  2. Using two wired ISPs (eg Fibre with VDSL failover)  is probably a bit pointless, they’re probably both going back to the same exchange or along the some conduit waiting for a  single backhoe to take them both out at once.
  3. It’s kind of pointless if you don’t put this behind a UPS, otherwise you’ll still be offline when the power goes out. Strongly recommend having your entire comms cabinet on UPS so your wifi, routing and failover all continue to work during outages.
  4. If you failover, be careful about data usage. Your computers won’t know they’re on an expensive mobile connection with limited data and they’ll happily download updates, steam games, backups, etc…. One approach is using a firewall to whitelist select systems only for failover (eg IoT devices, alarm, cameras) and leaving other devices like laptops blocked to prevent too much billshock.
  5. Partial ISP outages are still a PITA. Eg, if routing is broken to some NZ ISPs, but international is fine, the failover checks from ap-southeast-2 won’t trigger. Additional ping scripts could help here (eg check various ISP gateways from the Pi), but that’s getting rather complex and tries to solve a problem that’s never completely fixable.
  6. Just buy a Raspberry Pi. Don’t waste time/effort trying to hack some ancient crap together it wastes far too much time and often falls flat. And don’t use an old laptop/desktop, there’s too much to fail on them like fans, HDDs, etc. The Pi is solid embedded electronics.
  7. Remember that your Pi is essentially a server attached to the public internet. Make sure you configure firewalls and automatic patching and any other hardening you deem appropriate for such a system. Lock down SSH to keys only, IP restrict, etc.
Posted in Uncategorized | Tagged , , , , , , , , , | Leave a comment

Easy APT repo in S3

When running a number of Ubuntu or Debian servers, it can be extremely useful to have a custom APT repo for uploading your own packages, or third party packages that lack their own good repositories to subscribe to.

I recently found a nice Ruby utility called deb-s3 which allows easy uploading of dpkg files into an S3-hosted APT repository. It’s much easier than messing around with tools like reprepro and having to s3 cp or sync files up from a local disk into S3.

One main warning: This will create a *public* repo by default since it works out-of-the-box with the stock OS and (in my case) all the packages I’m serving are public open source programs that don’t need to be secured. If you want a *private* repo, you will need to use apt-transport-s3 to support authenticating with S3 to download files and configure deb-s3 for private upload.

Install like any other Ruby Gem:

gem install deb-s3

Adding packages is easy. First make sure your aws-cli is working OK and an S3 bucket has been created, then upload with:

deb-s3 upload \
--bucket example \
--codename codename \
--preserve-versions \
mypackage.deb

You can then add the repo to a Ubuntu or Debian server with:

# We trust HTTPS rather than GPG for this repo - but you can config
# GPG signing if you prefer.
cat > /etc/apt/sources.list.d/myrepo.list << EOF
deb [trusted=yes] https://example.s3.amazonaws.com codename main
EOF

# and ensure you update the package info on the server
apt-get update

Alternatively, here’s an example of how to add the repo with Puppet:

apt::source { 'myrepo':
 comment        => 'This is our own APT repo',
 location       => 'https://example.s3.amazonaws.com',
 release        => $::os["distro"]["codename"],
 repos          => 'main',
 allow_unsigned => true, # We don't GPG sign, HTTPS only
 notify_update  => true, # triggers apt-get update
}
Posted in Uncategorized | Tagged , , , , , , , | 1 Comment