Tag Archives: code

Any posts relating to software that I have developed, including contributions to other projects, but excluding any packaging work.

Experiments with Local LLMS for agentic coding

I don’t often do a heap of hands on stuff these days as a manager, hence the lack of tech blog posts in the past… 5+ years, but I’ve been digging into LLMs a bunch lately, especially in the area of agentic code (“vibe coding”) to see what the state of the technology is.

I’d describe myself as an AI realist, I can see the massive capabilities of the technology and use it daily, but I also don’t fully buy the hype that we’ll have thought leaders vibe coding entire applications without understanding the technology anytime soon. A capable understanding of what the heck it’s doing and what to instruct it to do is a key requirement IMHO. That said, I like to regularly challenge my assumptions around technology and nothing beats that than getting hands on with it.

I’ve used cloud based frontier models like GPT-4.1, Claude, etc. These have been surprisingly good when doing greenfield development, especially in well understood domains like web SaaS development coupled with clear requirements in prompts.

One challenge I can see is the rate of token consumption, especially once the current VC-fueled loss leader era runs dry and we’re actually paying for the GPU time consumed. So wanted to run some tests with local LLMs and see what’s possible there with the technology available today.

To be clear, I didn’t expect local to beat big cloud, but I was interested to see exactly how far behind the local options are in terms of hardware+model capabilities.

The Setup

I picked a simple example – an old Lambda I wrote back in 2017 and have basically not maintained since as it’s feature complete and AWS let me keep on using the node14 runtime. (The day Amazon introduces extended support pricing for older Lambda runtimes will be a tough day for me 😅).

This is ideal for a test of refactoring legacy code:

  • Old JS standards and library choices
  • Old v2 AWS SDK
  • Last built on Node 14
  • Using early pre-release version of the Serverless framework.
  • Simple “one file” app that shouldn’t tax any models with too much complexity.

For the test I enrolled my Windows gaming machine for the task. Unfortunately the AMD RX580 8GB GPU it had, whilst great for my basic gaming needs, is pretty unloved by the AI development community and wasn’t going to cut it (side note: this GPU is abandoned in ROCM but you can get it to run with llama.cpp using Vulkan, so it’s not incapable, just not the prime choice in an ecosystem that is extremely NVIDA centric. The vRAM size is also pretty small vs the sizes needed these days).

Big fan of these small mini ITX tower builds – enough room to fit an entire GPU without being a massive tower.

Given the GPU issues, I upgraded it to an 5060ti, which is the most cost effective option I could get ($1000 NZD / ~$590 USD) that could run models reasonably well without breaking the bank. To get more vRAM, I’d end up needing a 5090 which comes in a whopping $5000 NZD / ~$3000 USD which is a bit more than I’d like to spend on this experiment.

End result is a machine with:

  • Windows 11 Home edition
  • AMD Ryzen 7 3700X
  • 32GB DDR4
  • NVIDIA RTX 5060 Ti 16GB

For software, I ran Ollama directly on the host OS and exposed the connection to my Macbook where I was running development tooling and using vscode as the editor.

One challenge is that vscode’s Copilot extension on paper offers support for Local LLM models as well as the cloud models, but Microsoft intentionally blocks them if you’re getting your Copilot account via a business plan of any kind because F you I guess. I worked around this with a tweak to one line of code and built the extension locally, but ended up needing to abandon it and use the Continue.dev extension due to issues with Ollama + CoPilot extension not working together correctly to handle tools with the model, which I was able to work around mostly with Continue.dev’s more flexible options.

Test 1: qwen3-coder:30b

See the PR at https://github.com/jethrocarr/lambda-ping/pull/9 for the full notes, prompts and challenges.

This model is a bit above what my hardware could handle. The model itself is 19GB, but with the larger context windows for agentic workloads, I was consuming almost 26GB of memory which forced it to blend both CPU+RAM, and GPU to operate the model which impacted processing performance considerably.

Because of this, the generation was super slow, it took me a couple hours to complete the project with waiting time for the various prompt runs. Unworkable as a normal workflow. To run this workload, really need 32GB vRAM like an expensive RTX 5090 with 32GB, an (also expensive) Mac Studio with unified memory, or maybe the new framework desktop with it’s AMD AI Max 395+ chip and unified memory.

This model generated valid code, but it wasn’t particularly amazing work. It introduced a bug where only HTTPS requests would work (breaking HTTP) and required a prompt to fix, which it did with making a conditional to handle HTTP vs HTTPs. It also didn’t adopt AWS SDK v3 until specifically instructed to do so.

Test 2: qwen2.5-code:14b

See the PR at https://github.com/jethrocarr/lambda-ping/pull/10 for the full notes, prompts and challenges.

A smaller 9GB model that in theory fits entirely within my GPU, but with the context size, actually took around 19GB forcing again a blend of CPU+RAM and GPU for the task. That said, it was considerably faster with 78% of the workload running inside the GPU and not materially worse than the previous test for this specific coding scenario.

This model also generated valid code although it too had a bug, this time with not tracing duration of the request so was non functional on it’s first iteration. When prompted, it was able to correct.

Test 3: qwen2.5-coder:7b

See the PR at https://github.com/jethrocarr/lambda-ping/pull/11 for the full notes, prompts and challenges.

A smaller model yet again, this one fits entirely within the GPU even with the large context and runs super fast thanks to this. Unfortunately the model is just too simplistic and was unable to build a functional application.

I would describe it as capable of writing “code shaped text”. It technically writes valid Javascript but it just hallucinates up different libraries and mashes library names together. It’s also too simplistic to run properly agentically, so it ended up with me copying it’s output into the editor for each file it generated.

Test 4: Cloud LLM using Copilot and GPT-41.

See the PR at https://github.com/jethrocarr/lambda-ping/pull/12 for the full notes, prompts and challenges.

I was originally intending to use Copilot with a few different models so started with the lesser premium GPT-4.1 and… it just hit it right out of the gate immediately, no point even looking at the others. It essentially took a single prompt to refactor the JS into modern standards, moved to AWS-SDK v3 without prompting and the only other prompts were to clean up some old deps it no longer needed and adjust the node runtime for Lambda.

I did follow up with a stretch goal of trying to move it from Serverless to AWS SAM and it struggled a bit more here. I guess LLMs are a bit like many developers and struggle with devops/infrastructure tasks, so maybe devops has a bit more job security for a while… Essentially it got about 70% of the way there but had some mistakes with IAM policies and wrote files into some weird paths that needed moving around.

Learnings

  • Hardware and affordability is a big challenge to do serious agentic work locally. The hardware exists, but maybe not at a price most are willing to pay, specially when there’s a heap of cloud offerings right now at suspiciously cheap prices. Whether this can be sustained and what it ends up costing is yet TBD, there’s a lot of speculation about what the true operating costs are and pricing models/plans seem to change every 6 months as companies figure out what works.
  • Ultimately if the value is there for the customer, the price being charged by cloud could scale somewhat infinitely. $200/mo for Cursor for a hobbyist is a lot, for a professional development shop, it’s chump change vs labour costs. If AI can make a developer 2x more effective and costs $2,000 /mo? Still a bargain vs labour costs for a business.
  • Local LLMs are a hedge against the above, if a cloud vendor gets too greedy, at certain levels it becomes more cost effectively to lease or buy raw GPU power to run your own models. Of course there’s the issue of NVIDA having a defacto monopoly on AI capable GPU hardware these past few years, but AMD seems to be catching up and bringing out great new products like Strix Halo, enabling products like the Framework Desktop with an AMD AI MAX 395+ with up to 128GB of RAM. It might not have the fastest GPU cores ever, but it sure has a heap of unified memory to give those GPU cores a seriously massive pool of RAM. And of course Apple’s M-series architecture delivers similar capabilities, at a price premium.
  • Time solves the above, the local LLMs feel like where ChatGPT was a few years ago. And computer hardware goes through periods where the technology is extremely expensive before trickling down to the consumer, given a few more years, it’s fesible that we could see GPT-4.1 performance capabilities from affordable commodity hardware.
  • When factoring in memory size, it’s not just the size of the model, it the model + all the context size. Can easily double the effective working memory required making the hardware needed even more expensive.
  • All the models struggled with the more infra/devopsy type workload of needing to update a legacy deployment framework. I think this hints at a bit of the technology limitations right now, this specific scenario won’t have much reference material online to have trained on and requires somewhat specific understanding of the tool and AWS ecosystem. As a former devops and Linux nerd this gives me the warm and fuzzies about job security… until next year’s release of AI lifts the bar again and can just talk directly to AWS or something.
  • Windows is surprisingly… less trash than it used to be. Some key bits:
    • WSL 2 (Linux on Windows) works, and most importantly for this topic, has GPU passthrough support so you can run CUDA workloads on a Linux environment using the NVIDA hardware on the host itself. This is a massive win for anyone doing any kind of AI experimentation but not wanting to dual boot their gaming box or just not comfortable setting up Linux as a lab OS.
    • Windows includes OpenSSH server now. Yeah I was surprised to. In true windows fashion it’s a bit clunky but you can now SSH into powershell once it’s setup. I haven’t yet been able to get SSH keys to work so it’s password auth only right now.
    • Docker on Windows uses the WSL subsystem so again, has GPU access.
    • WinGet gives windows a half usable package manager.
    • If you enable SVM virtualization and your Windows install just immediately bluescreens at boot, make sure mini dump is enabled and you can capture some kernel dumps and open it up to review. In my case, there was a legacy driver that crashed when running under hyper v that I was able to isolate and delete. Reading online, a lot of folks seem to resort to doing a windows reinstall to fix issues with being unable to enable SVM.
    • WSL is a little funky with grabbing a lot of RAM for itself, for someone like myself running a mix of workloads in containers and a on the host OS, I may need to startup/shutdown when changing mode (eg AI lab vs Gaming) as WSL doesn’t dynamically claw back RAM that well so you end up with a Linux VM holding onto 10GB of RAM for 1GB of idle workloads.

Rabbit Hole: Hardware

I haven’t really kept up with hardware in a major way in the past few years, so it was interesting researching ways to run local LLM workloads. ChatGPT is actually very good with an encyclopaedic knowledge of CPUs and chipset PCIe lane setups, and there is a heap of real world reports on Reddit of different hardware setups. Few things I learnt:

  • My AM4 motherboard has a 16x PCIe slot, but it’s a PCIe 3.0 slot. The NVIDIA RTX 5060 Ti is a PCIe 5.0 card but only 8x. This is fine on PCIe 5.0 where the bus is fast but when running at 3.0 speed, it can bottleneck for AI as the effective bandwidth is 8GB/s vs what could be 32 GB/s. It doesn’t impact gaming which doesn’t get close to this much bandwidth, and it doesn’t impact a single model running entirely in GPU, but it does impact any models spilling to CPU+RAM like in my case when doing agentic workloads with large contexts.
  • Consumer AMD and Intel CPUs don’t actually have heaps of PCIe lanes available. For example, AM5 offers 28 lanes which sounds like a heap, until you realise 16x goes directly to the GPU, 4x to the NVMe slot, 4x to the chipset and 4x general purpose (additional NVMe or expansion 4x slot).
  • This means you’re not going to see consumer motherboards with multiple 16x slots, so sorry , you can’t just build one AM5 box and put in 4x 5060s. Or if a board offers it, it might be getting shared, eg the 16x GPU runs as 2x 8x slots, or you have slots hanging off the chipset and bottlenecked at 4x back to CPU.
  • .. unless you go for server hardware, ie Threadripper, Epyc or Xeon which does have serious amounts of lanes. You can get some good deals on used hardware here, but there’s a spanner in the works with the “good priced” deals having older generations of PCIe 3.0 which limits bandwidth.
  • PCIe 3.0,4.0,5.0 – the difference is essentially 2x performance in bandwidth per generation. So it is a pretty big deal for these bandwidth heavy workloads for AI. It’s not such an issue for a single GPU if the workload fits entirely in vRAM but if using multiple GPUs the bandwidth for moving data between them can become a bottleneck.
  • As a side note of a side note, this is why most consumer NVMe NASes kinda suck performance wise – they’re not using chips capable of providing that many high performance PCIe lanes. They have other wins, like size, power, noise and reliability, but performance is a problem.
    • To get actual max performance on NVMe you will need to use the 16x slot with bifurcation to pack in the M2 NVMe cards , but that then only leaves the PCIe 4x slot for networking.
    • And you’ll need to upgrade networking, as 10Gbits is basically nothing vs the speed of NVMe drives when a single drive is doing ~56Gbits. In theory a PCIe 5.0 4x slot can do 128Gbits but I’ve not seen any QSFP+ cards that are PCIe 5.0 and 4x, most seem to be PCIe 3.0 8x cards, it might take a few more years for this hardware to catch up, the server market seems happy enough with 8x cards which is driving this kind of networking gear.
    • So the most cost effective top-performance NVMe NAS right now, might be something like an “X99” motherboard with used Xeons and a heap of PCIe 3.0 lanes capable of supporting both NVME storage and networking cards. Or, Apple M4-series hardware using Thunderbolt attached NVMe and Thunderbolt/USB4 networking to the Mac workstations needing access to the storage.
    • Whilst I was digging in this subject I basically came to the conclusion that high performance NVMe NASes aren’t worth it right now for consumers. You need power hungry or expensive hardware to run the drives and then you get bottlenecked on network and need to spend a bunch on server grade networking gear. Might as well just get Thunderbolt/USB4 NVMe enclosures and directly connect them to your Mac workstation. Even a Macbook Air with it’s 2x ports can support 1x external screen+power delivery, and 1x NVMe drive concurrently which would meet any editing or other high I/O workflow that you’d conceivable want to run on such a machine.

Rabbit Hole: Other Reading

Firebase FCM upstream with Swift on iOS

I’ve been learning a bit of Swift lately in order to write an iOS app for my alarm system. I’m not very good at it yet, but figured I’d write some notes to help anyone else playing with the murky world of Firebase Cloud Messaging/FCM and iOS.

One of the key parts of the design is that I wanted the alarm app and the alarm server to communicate directly with each other without needing public facing endpoints, rather than the conventional design when the app interacts via an HTTP API.

The intention of this design is that it means I can dump all the alarm software onto a small embedded computer and as long as that computer has outbound internet access, it just works™️. No headaches about discovering the endpoint of the service and much more simplified security as there’s no public-facing web server.

Given I need to deliver push notifications to the app, I implemented Google Firebase Cloud Messaging (FCM) – formerly GCM – for push delivery to both iOS and Android apps.

Whilst FCM is commonly used for pushing to devices, it also supports pushing messages back upstream to the server from the device. In order to do this, the server must be implemented as an XMPP server and the FCM SDK be embedded into the app.

The server was reasonably straight forwards, I’ve written a small Java daemon that uses a reference XMPP client implementation and wraps some additional logic to work with HowAlarming.

The client side was a bit more tricky. Google has some docs covering how to implement upstream messaging in the iOS app, but I had a few issues to solve that weren’t clearly detailed there.

 

Handling failure of FCM upstream message delivery

Firstly, it’s important to have some logic in place to handle/report back if a message can not be sent upstream – otherwise you have no way to tell if it’s worked. To do this in swift, I added a notification observer for .MessagingSendError which is thrown by the FCM SDK if it’s unable to send upstream.

class AppDelegate: UIResponder, UIApplicationDelegate, MessagingDelegate {

 func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey: Any]?) -> Bool {
   ...
   // Trigger if we fail to send a message upstream for any reason.
   NotificationCenter.default.addObserver(self, selector: #selector(onMessagingUpstreamFailure(_:)), name: .MessagingSendError, object: nil)
   ...
 }

 @objc
 func onMessagingUpstreamFailure(_ notification: Notification) {
   // FCM tends not to give us any kind of useful message here, but
   // at least we now know it failed for when we start debugging it.
   print("A failure occurred when attempting to send a message upstream via FCM")
 }
}

Unfortunately I’m yet to see a useful error code back from FCM in response to any failures to send message upstream – seem to just get back a 501 error to anything that has gone wrong which isn’t overly helpful… especially since in web programming land, any 5xx series error implies it’s the remote server’s fault rather than the client’s.

 

Getting the GCM Sender ID

In order to send messages upstream, you need the GCM Sender ID. This is available in the GoogleService-Info.plist file that is included in the app build, but I couldn’t figure out a way to extract this easily from the FCM SDK. There probably is a better/nice way of doing this, but the following hack works:

// Here we are extracting out the GCM SENDER ID from the Google
// plist file. There used to be an easy way to get this with GCM, but
// it's non-obvious with FCM so here's a hacky approach instead.
if let path = Bundle.main.path(forResource: "GoogleService-Info", ofType: "plist") {
  let dictRoot = NSDictionary(contentsOfFile: path)
  if let dict = dictRoot {
    if let gcmSenderId = dict["GCM_SENDER_ID"] as? String {
       self.gcmSenderId = gcmSenderId // make available on AppDelegate to whole app
    }
  }
}

And yes, although we’re all about FCM now, this part hasn’t been rebranded from the old GCM product, so enjoy having yet another acronym in your app.

 

Ensuring the FCM direct channel is established

Finally the biggest cause I had for upstream message delivery failing, is that I was often trying to send an upstream message before FCM had finished establishing the direct channel.

This happens for you automatically by the SDK whenever the app is loaded into foreground, provided that you have shouldEstablishDirectChannel set to true. This can take up to several seconds after application launch for it to actually complete – which means if you try to send upstream too early, the connection isn’t ready, and your send fails with an obscure 501 error.

The best solution I found was to use an observer to listen to .MessagingConnectionStateChanged which is triggered whenever the FCM direct channel connects or disconnects. By listening to this notification, you know when FCM is ready and capable of delivering upstream messages.

An additional bonus of this observer, is that when it indicates the FCM direct channel is established, by that time the FCM token for the device is available to your app to use if needed.

So my approach is to:

  1. Setup FCM with shouldEstablishDirectChannel set to true (otherwise you won’t be going upstream at all!).
  2. Setup an observer on .MessagingConnectionStateChanged
  3. When triggered, use Messaging.messaging().isDirectChannelEstablished to see if we have a connection ready for us to use.
  4. If so, pull the FCM token (device token) and the GCM Sender ID and retain in AppDelegate for other parts of the app to use at any point.
  5. Dispatch the message to upstream with whatever you want in messageData.

My implementation looks a bit like this:

class AppDelegate: UIResponder, UIApplicationDelegate, MessagingDelegate {

 func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey: Any]?) -> Bool {
  ...
  // Configure FCM and other Firebase APIs with a single call.
  FirebaseApp.configure()

  // Setup FCM messaging
  Messaging.messaging().delegate = self
  Messaging.messaging().shouldEstablishDirectChannel = true

  // Trigger when FCM establishes it's direct connection. We want to know this to avoid race conditions where we
  // try to post upstream messages before the direct connection is ready... which kind of sucks.
  NotificationCenter.default.addObserver(self, selector: #selector(onMessagingDirectChannelStateChanged(_:)), name: .MessagingConnectionStateChanged, object: nil)
  ...
 }

 @objc
 func onMessagingDirectChannelStateChanged(_ notification: Notification) {
  // This is our own function listen for the direct connection to be established.
  print("Is FCM Direct Channel Established: \(Messaging.messaging().isDirectChannelEstablished)")

  if (Messaging.messaging().isDirectChannelEstablished) {
   // Set the FCM token. Given that a direct channel has been established, it kind of implies that this
   // must be available to us..
   if self.registrationToken == nil {
    if let fcmToken = Messaging.messaging().fcmToken {
     self.registrationToken = fcmToken
     print("Firebase registration token: \(fcmToken)")
    }
   }

   // Here we are extracting out the GCM SENDER ID from the Google PList file. There used to be an easy way
   // to get this with GCM, but it's non-obvious with FCM so we're just going to read the plist file.
   if let path = Bundle.main.path(forResource: "GoogleService-Info", ofType: "plist") {
    let dictRoot = NSDictionary(contentsOfFile: path)
     if let dict = dictRoot {
      if let gcmSenderId = dict["GCM_SENDER_ID"] as? String {
       self.gcmSenderID = gcmSenderId
     }
    }
   }

  // Send an upstream message
  let messageId = ProcessInfo().globallyUniqueString
  let messageData: [String: String] = [
   "registration_token": self.registrationToken!, // In my use case, I want to know which device sent us the message
   "marco": "polo"
  ]
  let messageTo: String = self.gcmSenderID! + "@gcm.googleapis.com"
  let ttl: Int64 = 0 // Seconds. 0 means "do immediately or throw away"

  print("Sending message to FCM server: \(messageTo)")

  Messaging.messaging().sendMessage(messageData, to: messageTo, withMessageID: messageId, timeToLive: ttl)
  }
 }

 ...
}

For a full FCM downstream and upstream implementation example, you can take a look at the HowAlarming iOS app source code on Github and if you need a server reference, take a look at the HowAlarming GCM server in Java.

 

Learnings

It’s been an interesting exercise – I wouldn’t particularly recommend this architecture for anyone building real world apps, the main headaches I ran into were:

  1. FCM SDK just seems a bit buggy. I had a lot of trouble with the GCM SDK and the move to FCM did improve stuff a bit, but there’s still a number of issues that occur from time to time. For example: occasionally a FCM Direct Channel isn’t established for no clear reason until the app is terminated and restarted.
  2. Needing to do things like making sure FCM Direct Channel is ready before sending upstream messages should probably be handled transparently by the SDK rather than by the app developer.
  3. I have still yet to get background code execution on notifications working properly. I get the push notification without a problem, but seem to be unable to trigger my app to execute code even with content-available == 1 . Maybe a bug in my code, or FCM might be complicating the mix in some way, vs using pure APNS. Probably my code.
  4. It’s tricky using FCM messages alone to populate the app data, occasionally have issues such as messages arriving out of order, not arriving at all, or occasionally ending up with duplicates. This requires the app code to process, sort and re-populate the table view controller which isn’t a lot of fun. I suspect it would be a lot easier to simply re-populate the view controller on load from an HTTP endpoint and simply use FCM messages to trigger refreshes of the data if the user taps on a notification.

So my view for other projects in future would be to use FCM purely for server->app message delivery (ie: “tell the user there’s a reason to open the app”) and then rely entirely on a classic app client and HTTP API model for all further interactions back to the server.

Detectatron

I recently installed security cameras around my house which are doing an awesome job of recording all the events that take place around my house and grounds (generally of the feline variety).

Unfortunately the motion capture tends to be overly trigger happy and I end up with heaps of recordings of trees waving, clouds moving or insects flying past. It’s not a problem from a security perspective as I’m not missing any events, but it makes it harder to check the feed for noteworthy events during the day.

I decided I’d like to write some logic for processing the videos being generated and decided to write a proof of concept that sucks video out of the Ubiquiti Unifi Video server and then analyses it with Amazon Web Services new AI product “Rekognition” to identify interesting videos worthy of note.

What this means, is that I can now filter out all the noise from my motion recordings by doing image recognition and flagging the specific videos that feature events I consider interesting, such as footage featuring people or cats doing crazy things.

I’ve got a 20 minute talk about this system which you can watch below, introducing it’s capabilities and how I’m using the AWS Rekognition service to solve this problem. The talk was for the Wellington AWS Users Group, so it focuses a bit more on the AWS aspects of Rekognition and AWS architecture rather than the Unifi video integration side of things.

The software I wrote has two parts – “Detectatron” which is the backend Java service for processing each video and storing it in S3 after processing and the connector I wrote for integration with the Unifi Video service. These can be found at:

https://github.com/jethrocarr/detectatron
https://github.com/jethrocarr/detectatron-connector-unifi

The code quality is rather poor right now – insufficient unit tests, bad structure and in need of a good refactor, but I wanted to get it up sooner rather than later… since perfection is always the enemy of just shipping something.

Note that whilst I’ve only added support for the product I use (Ubiquiti’s Unifi Video), I’ve designed it so that it’s pretty trivial to build other connectors for other platforms. I’d love to see contributions like connectors for Zone Minder and other popular open source or commercial platforms.

If you’re using Unifi Video, my connector will automatically mark any videos it deems as interesting as locked videos, for easy filtering using the native Unifi Video apps and web interface.

It also includes an S3 upload feature – given that I integrated with the Unifi Video software, it was a trivial step to extend it to also upload every video the system records into S3 within a few seconds for off-site retention. This performs really well, my on-prem NVR really struggled to keep up with uploads when using inotify + awscli to upload footage, but using my connector and Detectatron it has no issues keeping up with even high video rates.

Puppet Training

I recently ran a training session for our development team at Fairfax introducing them to the fundamentals of Puppet.

To assist with this training, I’ve developed a bunch of scripts and learning modules which I’ve now open sourced at https://github.com/jethrocarr/puppet-training

Using these modules you can:

  • Provision a pre-configured Puppet master + Puppet client to use for exercises.
  • Learn the basics of an r10k/git workflow for Puppet modules.
  • Create a module and get used to Puppet resources like package, file and service.
  • Learn the basics of ordering and dependencies.
  • Use Hiera to set params.

It’s not as complete a course as I’d like. I did about half a day using these modules and another half the day adhoc. Ideally I’d like to finish off writing modules at some point, but it takes a reasonably long period to write anything like this and there’s only so many hours in a week :-)

Putting up here as they might be of interest to people. PRs are always welcome as well.

Building a mail server with Puppet

A few months back I rebuilt my personal server infrastructure and fully Puppetised everything, even the mail server. Because I keep having people ask me how to setup a mail server, I’ve gone and adjusted my Puppet modules to make them suitable for a wider audience and open sourced them.

Hence announcing – https://github.com/jethrocarr/puppet-mail

This module has been designed for hobbyists or small organisation mail server operators whom want an easy solution to build and manage a mail server that doesn’t try to be too complex. If you’re running an ISP with 30,000 mailboxes, this probably isn’t the module for you. But 5 users? Yourself only? Keep on reading!

Mail servers can be difficult to configure, particularly when figuring out the linking between MTA (eg Postfix) and LDA (eg Dovecot) and authentication (SASL? Cyrus? Wut?), plus there’s the added headaches of dealing with spam and making sure your configuration is properly locked down to prevent open relays.

By using this Puppet module, you’ll end up with a mail server that:

  • Uses Postfix as the MTA.
  • Uses Dovecot for providing IMAP.
  • Enforces SSL/TLS and generates a legitimate cert automatically with LetsEncrypt.
  • Filters spam using SpamAssassin.
  • Provides Sieve for server-side email filtering rules.
  • Simple authentication against PAM for easy management of users.
  • Supports virtual email aliases and multiple domains.
  • Supports CentOS (7) and Ubuntu (16.04).

To get started with this module, you’ll need a functional Puppet setup. If you’re new to Puppet, I recommend reading Setting up and using Pupistry for a master-less Puppet setup.

Then it’s just a case of adding the following to r10k to include all the modules and dependencies:

mod 'puppetlabs/stdlib'

# EPEL & Jethro Repo modules only required for CentOS/RHEL systems
mod 'stahnma/epel'
mod 'jethrocarr/repo_jethro'

# Note that the letsencrypt module needs to be the upstream Github version,
# the version on PuppetForge is too old.
mod 'letsencrypt',
  :git    => 'https://github.com/danzilio/puppet-letsencrypt.git',
  :branch => 'master'

# This postfix module is a fork of thias/puppet-postfix with some fixes
# to make it more suitable for the needs of this module. Longer-term,
# expect to merge it into this one and drop unnecessary functionality.
mod 'postfix',
  :git    => 'https://github.com/jethrocarr/puppet-postfix.git',
  :branch => 'master'

And the following your Puppet manifests (eg site.pp):

class { '::mail': }

And in Hiera, define the specific configuration for your server:

mail::server_hostname: 'setme.example.com'
mail::server_label: 'My awesome mail server'
mail::enable_antispam: true
mail::enable_graylisting: false
mail::virtual_domains:
 - example.com
mail::virtual_addresses:
  'nickname@example.com': 'user'
  'user@example.com': 'user'

That’s all the Puppet config done! Before you apply it to the server, you also need to make sure both your forward and reverse DNS is correct in order to be able to get the SSL/TLS cert and also to ensure major email providers will accept your messages.

$ host mail.example.com
mail.example.com has address 10.0.0.1

$ host 10.0.0.1
1.0.0.10.in-addr.arpa domain name pointer mail.example.com.

For each domain being served, you need to setup MX records and also a TXT record for SPF:

$ host -t MX example.com
example.com mail is handled by 10 mail.example.com.

$ host -t TXT example.com
example.com descriptive text "v=spf1 mx -all"

Note that SPF used to have it’s own DNS type, but that was replaced in favour of just using TXT.

The example above tells other mail servers that whatever system is mentioned in the MX record is a legitmate mail server for that domain. For details about what SPF records and what their values mean, please refer to the OpenSPF website.

Finally, you should read the section on firewalling in the README, there are a number of ports that you’ll probably want to restrict to trusted IP ranges to prevent attackers trying to force their way into your system with password guess attempts.

Hopefully this ends up being useful to some people. I’ve replaced my own internal-only module for my mail server with this one, so I’ll continue to dogfood it to make sure it’s solid.

That being said, this module is new and deals with a complex configuration so I’m sure there will be some issues people run into – please raise any problems you have on the Github issues page and I’ll do my best to assist where possible.

Easy IKEv2 VPN for mobile devices (inc iOS)

I recently obtained an iPhone and needed to connect it to my VPN. However my existing VPN server was an OpenVPN installation which works lovely on traditional desktop operating systems and Android, but the iOS client is a bit more questionable having last been updated in September 2014 (pre iOS 9).

I decided to look into what the “proper” VPN option would be for iOS in order to get something that should be supported by the OS as smoothly as possible. Last time I looked this was full of wonderful horrors like PPTP (not actually encrypted!!) and L2TP/IPSec (configuration hell), so I had always avoided like the plague.

However as of iOS 9+, Apple has implemented support for IKEv2 VPNs which offers an interesting new option. What particularly made this option attractive for me, is that I can support every device I have with the one VPN standard:

  • IKEv2 is built into iOS 9 and MacOS El Capitan.
  • IKEv2 is built into Windows 10.
  • Works on Android with a third party client (hopeful for native integration soonish?).
  • Naturally works on GNU/Linux.

Whilst I love OpenVPN, being able to use the stock OS features instead of a third party client is always nice, particularly on mobile where power management and background tasks behaviour can be interesting.

IKEv2 on mobile also has some other nice features, such as MOBIKE, which makes it very seamless when switching between different networks (like the cellular to WiFi dance we do constantly with phones/tablets). This is something that OpenVPN can’t do – whilst it’s generally fast and reliable at establishing a connection, a change in the network means issuing a reconnect, it doesn’t just move the current connection across.

 

Given that I run GNU/Linux servers, I went for one of the popular IPSec solutions available on most distributions – StrongSwan.

Unfortunately whilst it’s technical capabilities are excellent, it’s documentation isn’t great. Best way to describe it is that every option is documented, but what options and why you’d want to use them? Not so much. The “left” vs “right” style documentation is also a right pain to work with, it’s not a configuration format that reads nicely and clearly.

Trying to find clear instructions and working examples of configurations for doing IKEv2 with iOS devices was also difficult and there’s some real traps for young players such as generating SHA1 certs instead of SHA2 when using the tools with defaults.

The other fun is that I also wanted my iOS device setup properly to:

  1. Use certificate based authentication, rather than PSK.
  2. Only connect to the VPN when outside of my house.
  3. Remain connected to the VPN even when moving between networks, etc.

I found the best way to make it work, was to use Apple Configurator to generate a .mobileconfig file for my iOS devices that includes all my VPN settings and certificates in an easy-to-import package, but also (critically) allows me to define options that are not selectable to end users, such as on-demand VPN establishment.

 

After a few nights of messing around and cursing the fact that all the major OS vendors haven’t just implemented OpenVPN, I managed to get a working connection. To avoid others the same pain, I considered writing a guide – but it’s actually a really complex setup, so instead I decided to write a Puppet module (clone from github / or install from puppetforge) which does the following heavy lifting for you:

  • Installs StrongSwan (on a Debian/derived GNU/Linux system).
  • Configures StrongSwan for IKEv2 roadwarrior style VPNs.
  • Generates all the CA, cert and key files for the VPN server.
  • Generates each client’s certs for you.
  • Generates a .mobileconfig file for iOS devices so you can have a single import of all the configuration, certs and ondemand rules and don’t have to have a Mac to use Apple Configurator.

This means you can save yourself all the heavy lifting and setup a VPN with as little as the following Puppet code:

class { 'roadwarrior':
   manage_firewall_v4 => true,
   manage_firewall_v6 => true,
   vpn_name           => 'vpn.example.com',
   vpn_range_v4       => '10.10.10.0/24',
   vpn_route_v4       => '192.168.0.0/16',
 }

roadwarrior::client { 'myiphone':
  ondemand_connect       => true,
  ondemand_ssid_excludes => ['wifihouse'],
}

roadwarrior::client { 'android': }

The above example sets up a routed VPN using 10.10.10.0/24 as the VPN client range and routes the 192.168.0.0/16 network behind the VPN server back through. (Note that I haven’t added masquerading options yet, so your gateway has to know to route the vpn_range back to the VPN server).

It then defines two clients – “myiphone” and “android”. And in the .mobileconfig file generated for the “myiphone” client, it will specifically generate rules that cause the VPN to maintain a constant connection, except when connected to a WiFi network called “wifihouse”.

The certs and .mobileconfig files are helpfully placed in  /etc/ipsec.d/dist/ for your rsync’ing pleasure including a few different formats to help load onto fussy devices.

 

Hopefully this module is useful to some of you. If you’re new to Puppet but want to take advantage of it, you could always check out my introduction to Puppet with Pupistry guide.

If you’re not sure of my Puppet modules or prefer other config management systems (or *gasp* none at all!) the Puppet module should be fairly readable and easy enough to translate into your own commands to run.

There a few things I still want to do – I haven’t yet done IPv6 configuration (which I’ll fix since I run a dual-stack network everywhere) and I intend to add a masquerade firewall feature for those struggling with routing properly between their VPN and LAN.

I’ve been using this configuration for a few weeks on a couple iOS 9.3.1 devices and it’s been working beautifully, especially with the ondemand configuration which I haven’t been able to do on any other devices (like Android or MacOS) yet unfortunately. The power consumption overhead seems minimal, but of course your mileage may vary.

It would be good to test with Windows 10 and as many other devices as possible. I don’t intend for this module to support non-roadwarrior type configs (eg site-to-site linking) to keep things simple, but happy to merge any PRs that make it easier to connect more mobile devices or branch routers back to a main VPN host. Also happy to merge PRs for more GNU/Linux distribution support- currently only support Debian/Ubuntu, but it shouldn’t be hard to add others.

If you’re on Android, this VPN will work for you, but you may find the OpenVPN client better and more flexible since the Android client doesn’t have the same level of on demand functionality that iOS has built in. You may also find OpenVPN a better option if you’re regularly using restrictive networks that only allow “HTTPS” out, since it can be run on TCP/443, whereas StrongSwan IKEv2 runs on UDP port 500 or 4500.

Node.js deployments at Fairfax with Code Deploy, Codeship and 12factor

This week I presented at the Node.js Wellington meetup around the tooling we have setup at Fairfax for running micro services for Node.js apps.

Essentially we have a workflow that uses Codeship for CI/CD and AWS Code Deploy for deployment. Our apps follow the principals of the Twelve-Factor App making each service simple and consistent to deploy.

This talk covers the reasons for this particular approach, the technologies used and offers a look at our stack including infrastructure and the deployment pipeline.

Whilst this talk is Node.js specific, we use the same technology for both Node.js and Java microservices and will shortly be standardising our Ruby applications on this approach as well.

More Puppet Stuff

I’ve been continuing to migrate to my new server setup and Puppetising along the way, the outcome is yet more Puppet modules:

  1. The puppetlabs-firewall module performs very poorly with large rulesets, to work around this with my geoip/rirs module, I’ve gone and written puppet-speedychains, which generates iptables chains outside of the one-rule, one-resource Puppet logic. This allows me to do thousands of results in a matter of seconds vs hours using the standard module.
  2. If you’re doing Puppet for any more than a couple of users and systems, at some point you’ll want to write a user module that takes advantage of virtual users to make it easy to select which systems should have a specific user account on it. I’ve open sourced my (very basic) virtual user module as a handy reference point, including examples on how to use Hiera to store the user information.

Additionally, I’ve been working on Pupistry lightly, including building a version that runs on the ancient Ruby 1.8.7 versions found on RHEL/CentOS 5 & 6. You can check out this version in the legacy branch currently.

I’m undecided about whether or not I merge this into the main branch, since although it works fine on newer Ruby versions, I’m not sure if it could limit me significantly in future or not, so it might be best to keep the legacy branch as special thing for ancient versions only.

Puppet modules

I’m in the middle of doing a migration of my personal server infrastructure from a 2006-era colocation server onto modern cloud hosting providers.

As part of this migration, I’m rebuilding everything properly using Puppet (use it heavily at work so it’s a good fit here) with the intention of being able to complete server builds without requiring any manual effort.

Along the way I’m finding gaps where the available modules don’t quite cut it or nobody seems to have done it before, so I’ve been writing a few modules and putting them up on GitHub for others to benefit/suffer from.

 

puppet-hostname

https://github.com/jethrocarr/puppet-hostname

Trying to do anything consistently with host naming is always fun, since every organisation or individual has their own special naming scheme and approach to dealing with the issue of naming things.

I decided to take a different approach. Essentially every cloud provider will give you a source of information that could be used to name your instance whether it’s the AWS Instance ID, or a VPS provider passing through the name you gave the machine at creation. Given I want to treat my instances like cattle, an automatic soulless generated name is perfect!

Where they fall down, is that they don’t tend to setup the FQDN properly. I’ve seen a number of solution to this including user data setup scripts, but I’m trying to avoid putting anything in user data that isn’t 100% critical and sticking to my Pupistry bootstrap so I wanted to set my FQDN via Puppet itself.

(It’s even possible to set the hostname itself if desired, you can use logic such as tags or other values passed in as facts to define what role a machine has and then generate/set a hostname entirely within Puppet).

Hence puppet-hostname provides a handy way to easily set FQDN (optionally including the hostname itself) and then trigger reloads on name-dependent services such as syslog.

None of this is revolutionary, but it’s nice getting it into a proper structure instead of relying on yet-another-bunch-of-userdata that’s specific to my systems. The next step is to look into having it execute functions to do DNS changes on providers like Route53 so there’s no longer any need for user data scripts being run to set DNS records at startup.

 

puppet-rirs

https://github.com/jethrocarr/puppet-rirs

There are various parts of my website that I want to be publicly reachable, such as the WordPress login/admin sections, but at the same time I also don’t want them accessible by any muppet with a bot to try and break their way in.

I could put up a portal of some kind, but this then breaks stuff like apps that want to talk with those endpoints since they can’t handle the authentication steps. What I can do, is setup a GeoIP rule that restricts access to the sections to the countries I’m actually in, which is generally just NZ or AU, to dramatically reduce the amount of noise and attempts people send my way, especially given most of the attacks come from more questionable countries or service providers.

I started doing this with mod_geoip2, but it’s honestly a buggy POS and it really doesn’t work properly if you have both IPv4 and IPv6 connections (one or another is OK). Plus it doesn’t help me for applications that support IP ACLs, but don’t offer a specific GeoIP plugin.

So instead of using GeoIP, I’ve written a custom Puppet function that pulls down the IP assignment lists from the various Regional Internet Registries and generate IP/CIDR lists for both IPv4 and IPv6 on a per-country basis.

I then use those lists to populate configurations like Apache, but it’s also quite possible to use it for other purposes such as iptables firewalling since the generated lists can be turned into Puppet resources. To keep performance sane, I cache the processed output for 24 hours and merge any continuous assignment blocks.

Basically, it’s GeoIP for Puppet with support for anything Puppet can configure. :-)

 

puppet-digitalocean

https://github.com/jethrocarr/puppet-digitalocean

Provides a fact which exposes details from the Digital Ocean instance API about the instance – similar to how you get values automatically about Amazon EC2 systems.

 

puppet-initfact

https://github.com/jethrocarr/puppet-initfact

The great thing about the open source world is how we can never agree so we end up with a proliferation of tools doing the same job. Even init systems are not immune, with anything tha intends to run on the major Linux distributions needing to support systemd, Upstart and SysVinit at least for the next few years.

Unfortunately the way that I see most Puppet module authors “deal” with this is that they simply write an init config/file that suits their distribution of choice and conveniently forget the other distributions. The number of times I’ve come across Puppet modules that claim support for Red Hat and Amazon Linux but only ship an Upstart file…. >:-(

Part of the issue is that it’s a pain to even figure out what distribution should be using what type of init configuration. So to solve this, I’ve written a custom Fact called “initsystem” which exposes the primary/best init system on the specific system it’s running on.

It operates in two modes – there is a curated list for specific known systems and then fallback to automatic detection where we don’t have a specific curated result handy.

It supports (or should) all major Linux distributions & derivatives plus FreeBSD and MacOS. Pull requests for others welcome, could do with more BSD support plus maybe even support for Windows if you’re feeling brave.

 

puppet-yas3fs

https://github.com/pcfens/puppet-yas3fs/commit/27af462f1ce2fe0610012a508236062e65017b5f

Not my module, but I recently submitted a PR to it (subsequently merged) which introduces support for a number of different distributions via use of my initfact module so it should now run on most distributions rather than just Ubuntu.

If you’re not familiar with yas3fs, it’s a FUSE driver that turns S3+SNS+SQS into a shared filesystem between multiple servers. Ideal for dealing with legacy applications that demand state on disk, but don’t require high I/O performance, I’m in the process of doing a proof-of-concept with it and it looks like it should work OK for low activity sites such as WordPress, although with no locking I’d advise against putting MySQL on it anytime soon :-)

 

These modules can all be found on GitHub, as well as the Puppet Forge. Hopefully someone other than myself finds them useful. :-)

Baking images with Packer & Pupistry

One of the common issues when building modern infrastructure-as-code style systems is that whilst automation is great, it also has a habit of failing at the worst possible time. There’s nothing quite like the fun of trying to autoscale only to find that a newer version of a package breaks compatibility or the repository mirror or Puppet master has gone offline breaking the whole carefully tuned process.

Naturally this is an issue. And whilst I’ve seen some organisations simply ignore the issue and place trust in their repos and configuration management servers, I’m also too pessimistic about technology to trust numerous components for any mission critical applications.

Fortunately there is a solution – we can bake a machine image that has all the applications and configuration pre-applied, so that autoscaling has no third party dependencies (or as close to no dependencies as we can get).

Baking has negative connotations of the bad old days when engineers would assemble custom machine images by hand and then copy them to build new systems, but it doesn’t have to be that way. We can still respect infrastructure-as-code principals and use modern tools like Puppet and Packer to reliably build consistent images as needed.

These images could be as simple as a base AMI image for Amazon AWS which includes the stock OS image plus your Puppet setup. Or they could be as complex as a fully configured and provisioned application server ready-to-go at the first boot.

To make baking images easier, I’ve added support for generating Packer templates pre-loaded with bootstrap data into Pupistry, making it quick and easy to get started. Here’s how you can use it:

Assumptions/prerequisites:

  • You’ve already got Pupistry setup and functional (No? Read the tutorial here)
  • You’ve installed the third party Packer utility.
  • You have an Amazon AWS account for doing the AMI build. Note that Packer isn’t exclusive to Amazon, so you can also use the same technique with other providers including Digital Ocean and OpenStack – but you’ll have to write your own template.

First we can list what Packer templates are available with Pupistry. If the OS/platform of your choice isn’t included, it’s not particularly hard to add it – these are mostly intended to provide a good starting point for customising your own.

pupistry packer

Screen Shot 2015-05-31 at 23.57.20

We can select a template with –template NAME and also pass the resulting output to a file with –file NAME.  The following will build an Amazon Linux template pre-loaded with Pupistry and the default manifest applied:

pupistry packer --template aws_amazon-any --file output.json

Screen Shot 2015-06-01 at 00.00.01

The generated template is a JSON file that includes various instructions to Packer on how to build the image, as well as the bootstrap data that can also be generated independently with pupistry bootstrap. Various variables can be tweaked, we can export out the variables available and see their default settings with:

packer inspect output.json

Screen Shot 2015-06-01 at 00.02.00

You can see here that we must set a VPC ID and Subnet ID – this is because they differ per AWS account and need to be provided. (Side note: technically you can do EC2 classic with Packer and avoid this, but the VPC instance types like t2 are cheaper to run… and we like cheap :-).

The AWS Region and AWS AMI values are interlinked. If you choose to build for a different region, eg us-west-1, you will need to lookup the appropriate AMI ID for that region and change both the aws_ami and aws_region variables when you bake your image. For some reason Amazon chose to make their AMI IDs specific to a particular region which really does make life a bit more difficult than it really needs to be. :-(

The hostname is worth noting. By default we set it to “packer” so you can target your manifests to handle it specifically, but you could make this anything you wanted such as a particular machine or application type. When using the sample puppet repo that ships by default with Pupistry, we have defined specific configuration to run on the Packer built images:

Screen Shot 2015-06-01 at 00.09.08

Assuming we are happy with the defaults, we just have to set the VPC and Subnet IDs to launch the current image in ap-southeast-2.

packer build \
 -var 'aws_vpc_id=vpc-example' \
 -var 'aws_subnet_id=subnet-example' \
 output.json

As soon as we kick off, we can see that Packer has built a machine in our AWS account to use for the image generation process.

Screen Shot 2015-06-01 at 00.13.53

 

It can take up to a minute for the machine to become available via SSH. Once this happens, Packer opens a connection and starts to feed in the bootstrap commands that have been added into the template by Pupistry.

Screen Shot 2015-06-01 at 00.14.23

This process can take a number of minutes – remember you’re having to install all the various OS updates and then packages and dependencies needed to run Puppet and of course Pupistry itself.

Once all the dependencies are done, Pupistry will run and provision the machine with your Puppet manifests and then return the ID of the AMI that has been generated:

Screen Shot 2015-06-01 at 00.31.57

 

We can see that Packer has now terminated our temporary machine:

Screen Shot 2015-06-01 at 00.22.28

And given us a shiny new AMI in return:

Screen Shot 2015-06-01 at 00.34.14

 

We can now use that AMI to launch a new machine and check out what Pupistry did. For convenience, there is a launch button on the AMI page that will build a new machine for the selected AMI, however you can also take the AMI ID and use it in CloudFormation, from the API or from the usual instance creation screen.

Connecting to the newly spun up instance using our fresh AMI, we can see that it has had the Pupistry rules for the packer node applied and we can also set that the daemon is configured and running in the background.

Except that it took less than 1 minute, rather than needing 5+ minutes to do all the usual updates and dependency installation. And there was no risk of a broken repository or package preventing the launch of our machine. If it was an application server, we could have preloaded it and thrown it right into an ELB within 1 minute after it starting up – that’s ideal for autoscaling!

Screen Shot 2015-06-01 at 00.38.28

Packer supports a number of different options and different providers, so don’t be afraid to pull it down and experiment. You can even write your own custom providers if needed.

Sure you could always just write a script that does all the same things as Packer for your cloud provider of choice, but Packer provides a solid framework for doing this stuff in a reliable and reproducible way saving you time and keeping complexity down.