Tag Archives: geek

Anything IT related (which is most things I say) :-)

Your cloud pricing isn’t webscale

Thankfully in 2015 most (but not all) proprietary software providers have moved away from the archaic ideology of software being licensed by the CPU core – a concept that reflected the value and importance of systems back when you were buying physical hardware, but rendered completely meaningless by cloud and virtualisation providers.

Taking it’s place came the subscription model, popularised by Software-as-a-Service (or “cloud”) products. The benefits are attractive – regular income via customer renewal payments, flexibility for customers wanting to change the level of product or number of systems covered and no CAPEX headaches in acquiring new products to use.

Clients win, vendors win, everyone is happy!

Or maybe not.

 

Whilst the horrible price-by-CPU model has died off, a new model has emerged – price by server. This model assumes that the more servers a customer has, the bigger they are and the more we should charge them.

The model makes some sense in a traditional virtualised environment (think VMWare) where boxes are sliced up and a client runs only as many as they need. You might only have a total of two servers for your enterprise application – primary and DR – each spec’ed appropriately to handle the max volume of user requests.

But the model fails horribly when clients start proper cloud adoption. Suddenly that one big server gets sliced up into 10 small servers which come and go by the hour as they’re needed to supply demand.

DevOps techniques such as configuration management suddenly turns the effort of running dozens of servers into the same as running a single machine, there’s no longer any reason to want to constrain yourself to a single machine.

It gets worse if the client decides to adopt microservices, where each application gets split off into it’s own server (or container aka Docker/CoreOS). And it’s going to get very weird when we start using compute-less computing more with services like Lambda and Hoist because who knows how many server licenses you need to run an application that doesn’t even run on a server that you control?

 

Really the per-server model for pricing is as bad as the per-core model, because it no longer has any reflection on the size of an organisation, the amount they’re using a product and most important, the value they’ve obtaining from the product.

So what’s the alternative? SaaS products tend to charge per-user, but the model doesn’t always work well for infrastructure tools. You could be running monitoring for a large company with 1,000 servers but only have 3 user accounts for a small sysadmin team, which doesn’t really work for the vendor.

Some products can charge based on volume or API calls, but even this is risky. A heavy micro-service architecture would result in large number of HTTP calls between applications, so you can hardly say an app with 10,000 req/min is getting 4x the value compared to a client with a 2,500 req/min application – it could be all internal API calls.

 

To give an example of how painful the current world of subscription licensing is with modern computing, let’s conduct a thought exercise and have a look at the current pricing model of some popular platforms.

Let’s go with creating a startup. I’m going to run a small SaaS app in my spare time, so I need a bit of compute, but also need business-level tools for monitoring and debugging so I can ensure quality as my startup grows and get notified if something breaks.

First up I need compute. Modern cloud compute providers *understand* subscription pricing. Their models are brilliantly engineered to offer a price point for everyone. Whether you want a random jump box for $2/month or a $2000/month massive high compute monster to crunch your big-data-peak-hipster-NoSQL dataset, they can deliver the product at the price point you want.

Let’s grab a basic Digital Ocean box. Well actually let’s grab 2, since we’re trying to make a redundant SaaS product. But we’re a cheap-as-chips startup, so let’s grab 2x $5/mo box.

Screen Shot 2015-11-03 at 21.46.40

Ok, so far we’ve spent $10/month for our two servers. And whilst Digital Ocean is pretty awesome our code is going to be pretty crap since we used a bunch of high/drunk (maybe both?) interns to write our PHP code. So we should get a real time application monitoring product, like Newrelic APM.

Screen Shot 2015-11-03 at 21.37.46

Woot! Newrelic have a free tier, that’s great news for our SaaS application – but actually it’s not really that useful, it can’t do much tracing and only keeps 24 hours history. Certainly not enough to debug anything more serious than my WordPress blog.

I’ll need the pro account to get anything useful, so let’s add a whopping $149/mo – but actually make that $298/mo since we have two servers. Great value really. :-/

 

Next we probably need some kind of paging for oncall when our app blows up horribly at 4am like it will undoubtably do. PagerDuty is one of the popular market leaders currently with a good reputation, let’s roll with them.

Screen Shot 2015-11-03 at 21.52.57

Hmm I guess that $9/mo isn’t too bad, although it’s essentially what I’m paying ($10/mo) for the compute itself. Except that it’s kinda useless since it’s USA and their friendly neighbour only and excludes us down under. So let’s go with the $29/mo plan to get something that actually works. $29/mo is a bit much for a $10/mo compute box really, but hey, it looks great next to NewRelic’s pricing…

 

Remembering that my SaaS app is going to be buggier than Windows Vista, I should probably get some error handling setup. That $298/mo Newrelic APM doesn’t include any kind of good error handler, so we should also go get another market leader, Raygun, for our error reporting and tracking.

Screen Shot 2015-11-03 at 22.00.54

For a small company this isn’t bad value really given you get 5 different apps and any number of muppets working with you can get onboard. But it’s still looking ridiculous compared to my $10/mo compute cost.

So what’s the total damage:

Compute: $10/month
Monitoring: $371/month

Ouch! Now maybe as a startup, I’ll churn up that extra money as an investment into getting a good quality product, but it’s a far cry from the day when someone could launch a new product on a shoestring budget in their spare time from their uni laptop.

 

Let’s look at the same thing from the perspective of a large enterprise. I’ve got a mission critical business application and it requires a 20 core machine with 64GB of RAM. And of course I need two of them for when Java inevitably runs out of heap because the business let muppets feed garbage from their IDE directly into the JVM and expected some kind of software to actually appear as a result.

That compute is going to cost me $640/mo per machine – so $1280/mo total. And all the other bits, Newrelic, Raygun, PagerDuty? Still that same $371/mo!

Compute: $1280/month
Monitoring: $371/month

It’s not hard to imagine that the large enterprise is getting much more value out of those services than the small startup and can clearly afford to pay for that in relation to the compute they’re consuming. But the pricing model doesn’t make that distinction.

 

So given that we know know that per-core pricing is terrible and per-server pricing is terrible and (at least for infrastructure tools) per-user pricing is terrible what’s the solution?

“Cloud Spend Licensing” [1]

[1] A term I’ve just made up, but sounds like something Gartner spits out.

With Cloud Spend Licensing, the amount charged reflects the amount you spend on compute – this is a much more accurate indicator of the size of an organisation and value being derived from a product than cores or servers or users.

But how does a vendor know what this spend is? This problem will be solving itself thanks to compute consumers starting to cluster around a few major public cloud players, the top three being Amazon (AWS), Microsoft (Azure) and Google (Compute Engine).

It would not be technically complicated to implement support for these major providers (and maybe a smattering of smaller ones like Heroku, Digital Ocean and Linode) to use their APIs to suck down service consumption/usage data and figure out a client’s compute spend in the past month.

For customers whom can’t (still on VMWare?) or don’t want to provide this data, there can always be the fallback to a more traditional pricing model, whether it be cores, servers or some other negotiation (“enterprise deal”).

 

 

How would this look?

In our above example, for our enterprise compute bill ($1280/mo) the equivalent amount spent on the monitoring products was 23% for Newrelic, 3% for Raygun and 2.2% for PagerDuty (total of 28.2%). Let’s make the assumption this pricing is reasonable for the value of the products gained for the sake of demonstration (glares at Newrelic).

When applied to our $10/month SaaS startup, the bill for this products would be an additional $2.82/month. This may seem so cheap there will be incentive to set a minimum price, but it’s vital to avoid doing so:

  • $2.82/mo means anyone starting up a new service uses your product. Because why not, it’s pocket change. That uni student working on the next big thing will use you. The receptionist writing her next mobile app success in her spare time will use you. An engineer in a massive enterprise will use you to quickly POC a future product on their personal credit card.
  • $2.82/mo might only just cover the cost of the service, but you’re not making any profit if they couldn’t afford to use it in the first place. The next best thing to profit is market share – provided that market share has a conversion path to profit in future (something some startups seem to forget, eh Twitter?).
  • $2.82/mo means IT pros use your product on their home servers for fun and then take their learning back to the enterprise. Every one of the providers above should have a ~ $10/year offering for IT pros to use and get hooked on their product, but they don’t. Newrelic is the closest with their free tier. No prizes if you guess which product I use on my personal servers. Definitely no prizes if you guess which product I can sell the benefits of the most to management at work.

 

But what about real earnings?

As our startup grows and gets bigger, it doesn’t matter if we add more servers, or upsize the existing ones to add bigger servers – the amount we pay for the related support applications is always proportionate.

It also caters for the emerging trend of running systems for limited hours or using spot prices – clients and vendor don’t have to worry about figuring out how it fits into the pricing model, instead the scale of your compute consumption sets the price of the servers.

Suddenly that $2.82/mo becomes $56.40/mo when the startup starts getting successful and starts running a few computers with actual specs. One day it becomes $371 when they’re running $1280/mo of compute tier like the big enterprise. And it goes up from there.

 

I’m not a business analyst and “Cloud Spend Licensing” may not be the best solution, but goddamn there has to be a more sensible approach than believing someone will spend $371/mo for their $10/mo compute setup. And I’d like to get to that future sooner rather than later please, because there’s a lot of cool stuff out there that I’d like to experiment with more in my own time – and that’s good for both myself and vendors.

 

Other thoughts:

  • I don’t want vendors to see all my compute spend details” – This would be easily solved by cloud provider exposing the right kind of APIs for this purpose eg, “grant vendor XYZ ability to see sum compute cost per month, but no details on what it is“.
  • I’ll split my compute into lots of accounts and only pay for services where I need it to keep my costs low” – Nothing different to the current situation where users selectively install agents on specific systems.
  • This one client with an ultra efficient, high profit, low compute app will take advantage of us.” – Nothing different to the per-server/per-core model then other than the min spend. Your client probably deserves the cheaper price as a reward for not writing the usual terrible inefficient code people churn out.
  • “This doesn’t work for my app” – This model is very specific to applications that support infrastructure, I don’t expect to see it suddenly being used for end user products/services.

Not all routing is equal

Ran into an interesting issue with my Routerboard CRS226-24G-2S+ “Cloud Router Switch” which is basically a smart layer 3 capable switch running Mikrotik’s RouterOS.

Whilst it’s specs mean it’s intended for switching rather than routing, given it has the full Mikrotik RouterOS on it it’s entirely possible to drop out a port from the switching hardware and use it to route traffic, in my case, between the LAN and WAN connections.

Routerboard’s website rate it’s routing capabilities as between 95.9 and 279 Mbits, in my own iperf tests before putting it into action I was able to do around 200Mbits routing. With only 40/10 Mbits WAN performance, this would work fine for my needs until we finally get UFB (fibre-to-the-home) in 2017.

However between this test and putting it into production, it’s ended up with a lot more firewall rules including NAT and when doing some work on the switch, I noticed that the CPU was often hitting the 100% threshold – which is never good for your networking hardware.

I wondered how much impact that maxed out CPU could be having on my WAN connection, so I used the very non-scientific Ookla Speedtest with the CRS doing my routing:

4735498067

After stripping all the routing work from the CRS and moving it to a small Routerboard 750 ethernet router, I’ve gained a few additional Mbits of performance:

4735587010

The CRS and the Routerboard 750 both feature a MIPS 24Kc 400Mhz CPU, so there’s no effective difference between the devices, in fact the switch is arguably faster as it’s a newer generation chip and has twice the memory, yet it performs worse.

The CPU usage that was formerly pegging at 100% on the CRS dropped to around 30% on the 750 when running these tests, so there clearly something going on in the CRS which is giving it a handicap.

The overhead of switching should be minimal in theory since it’s handled by dedicated hardware, however I wonder if there’s something weird like the main CPU having to give up time to handle events/operations for the switching hardware.

So yeah, a bit annoying – it’s still an awesome managed switch, but it would be nice if they dropped the (terrible) “Cloud Router Switch” name and just sell it for what it is – a damn nice layer 3 capable managed switch, but not as a router (unless they give it some more CPU so it can get the job done as well!).

For now the dedicated 750 as the router will keep me covered, although it will cap out at 100Mbits, both in terms of wire speed and routing capabilities so I may need to get a higher specced router come UFB time.

More Puppet Stuff

I’ve been continuing to migrate to my new server setup and Puppetising along the way, the outcome is yet more Puppet modules:

  1. The puppetlabs-firewall module performs very poorly with large rulesets, to work around this with my geoip/rirs module, I’ve gone and written puppet-speedychains, which generates iptables chains outside of the one-rule, one-resource Puppet logic. This allows me to do thousands of results in a matter of seconds vs hours using the standard module.
  2. If you’re doing Puppet for any more than a couple of users and systems, at some point you’ll want to write a user module that takes advantage of virtual users to make it easy to select which systems should have a specific user account on it. I’ve open sourced my (very basic) virtual user module as a handy reference point, including examples on how to use Hiera to store the user information.

Additionally, I’ve been working on Pupistry lightly, including building a version that runs on the ancient Ruby 1.8.7 versions found on RHEL/CentOS 5 & 6. You can check out this version in the legacy branch currently.

I’m undecided about whether or not I merge this into the main branch, since although it works fine on newer Ruby versions, I’m not sure if it could limit me significantly in future or not, so it might be best to keep the legacy branch as special thing for ancient versions only.

Finding & purging Puppet exported resources

Puppet exported resources is a pretty awesome feature – essentially it allows information from one node to be used on another to affect the resulting configuration. We use this for clever things like having nodes tell an Icinga/Nagios server what monitoring configuration should be added for them.

Of course like everything in the Puppet universe, it’s not without some catch – the biggest issue I’ve run into is that if you have a mistake and generate bad exported resources it can be extremely hard to find which node is responsible and take action.

For example, recently my Puppet runs started failing on the monitoring server with the following error:

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: A duplicate resource was found while collecting exported resources, with the type and title Icinga2::Object::Service[Durp Service Health Check] on node failpet1.nagios.example.com

The error is my fault, I forgot that exported resources must have globally unique names across the entire fleet, so I ended up with 2x “Durp Service Health Check” resources.

The problem is that it’s a big fleet and I’m not sure which of the many durp hosts is responsible. To make it more difficult, I suspect they’ve been deleted which is why the duplication clash isn’t clearing by itself after I fixed it.

Thankfully we can use the Puppet DB command line tools on the Puppet master to search the DB for the specific resource and find which hosts it is:

# puppet query nodes \
--puppetdb_host puppetdb.infrastructure.example.com \
"(@@Icinga2::Object::Service['Durp Service Health Check'])"

durphost1312.example.com
durphost3436.example.com

I can then purge all their data with:

# puppet node deactivate durphost1312.example.com
Submitted 'deactivate node' for durphost1312.example.com with UUID xxx-xxx-xxx-xx

In theory deleted hosts shouldn’t have old data in PuppetDB, but hey, sometimes our decommissioning tool has bugs… :-/

MacOS won’t build anything? Check xcode license

One of the annoyances of the MacOS platform is that whilst there’s a nice powerful UNIX underneath, there’s a rather dumb layer of top that does silly things like preventing the app store password being saved, or as I found the other day, disabling parts of the build system if the license hasn’t been accepted.

When you first setup MacOS to be useful, you need to install xcode’s build tools and libraries either via the app store, or with:

sudo xcode-select --install

However it seems if xcode gets updated via one of the routine updates, it can require that the license is re-accepted, and until that happens, it disable various builds of the build system.

I found the issue when I suddenly lost the ability to install native ruby gems, eg:

Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension.

 /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/bin/ruby extconf.rb
checking for BIO_read() in -lcrypto... *** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers. Check the mkmf.log file for more details. You may
need configuration options.

Provided configuration options:
 --with-opt-dir
 --without-opt-dir
 --with-opt-include
 --without-opt-include=${opt-dir}/include
 --with-opt-lib
 --without-opt-lib=${opt-dir}/lib
 --with-make-prog
 --without-make-prog
 --srcdir=.
 --curdir
 --ruby=/System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/bin/ruby
 --with-puma_http11-dir
 --without-puma_http11-dir
 --with-puma_http11-include
 --without-puma_http11-include=${puma_http11-dir}/include
 --with-puma_http11-lib
 --without-puma_http11-lib=${puma_http11-dir}/
 --with-cryptolib
 --without-cryptolib
/System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:434:in `try_do': The compiler failed to generate an executable file. (RuntimeError)
You have to install development tools first.
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:513:in `block in try_link0'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/tmpdir.rb:88:in `mktmpdir'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:510:in `try_link0'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:534:in `try_link'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:720:in `try_func'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:950:in `block in have_library'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:895:in `block in checking_for'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:340:in `block (2 levels) in postpone'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:310:in `open'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:340:in `block in postpone'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:310:in `open'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:336:in `postpone'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:894:in `checking_for'
 from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/mkmf.rb:945:in `have_library'
 from extconf.rb:6:in `block in <main>'
 from extconf.rb:6:in `each'
 from extconf.rb:6:in `find'
 from extconf.rb:6:in `<main>'


Gem files will remain installed in /var/folders/py/r973xbbn2g57sr4l_fmb9gtr0000gn/T/bundler20151009-29854-mszy85puma-2.14.0/gems/puma-2.14.0 for inspection.
Results logged to /var/folders/py/r973xbbn2g57sr4l_fmb9gtr0000gn/T/bundler20151009-29854-mszy85puma-2.14.0/gems/puma-2.14.0/ext/puma_http11/gem_make.out
An error occurred while installing puma (2.14.0), and Bundler cannot continue.
Make sure that `gem install puma -v '2.14.0'` succeeds before bundling.

The solution is quite simple:

sudo xcodebuild -license

Why Apple thinks their build tools are so important that they require their own license to be accepted every so often is beyond me.

Puppet modules

I’m in the middle of doing a migration of my personal server infrastructure from a 2006-era colocation server onto modern cloud hosting providers.

As part of this migration, I’m rebuilding everything properly using Puppet (use it heavily at work so it’s a good fit here) with the intention of being able to complete server builds without requiring any manual effort.

Along the way I’m finding gaps where the available modules don’t quite cut it or nobody seems to have done it before, so I’ve been writing a few modules and putting them up on GitHub for others to benefit/suffer from.

 

puppet-hostname

https://github.com/jethrocarr/puppet-hostname

Trying to do anything consistently with host naming is always fun, since every organisation or individual has their own special naming scheme and approach to dealing with the issue of naming things.

I decided to take a different approach. Essentially every cloud provider will give you a source of information that could be used to name your instance whether it’s the AWS Instance ID, or a VPS provider passing through the name you gave the machine at creation. Given I want to treat my instances like cattle, an automatic soulless generated name is perfect!

Where they fall down, is that they don’t tend to setup the FQDN properly. I’ve seen a number of solution to this including user data setup scripts, but I’m trying to avoid putting anything in user data that isn’t 100% critical and sticking to my Pupistry bootstrap so I wanted to set my FQDN via Puppet itself.

(It’s even possible to set the hostname itself if desired, you can use logic such as tags or other values passed in as facts to define what role a machine has and then generate/set a hostname entirely within Puppet).

Hence puppet-hostname provides a handy way to easily set FQDN (optionally including the hostname itself) and then trigger reloads on name-dependent services such as syslog.

None of this is revolutionary, but it’s nice getting it into a proper structure instead of relying on yet-another-bunch-of-userdata that’s specific to my systems. The next step is to look into having it execute functions to do DNS changes on providers like Route53 so there’s no longer any need for user data scripts being run to set DNS records at startup.

 

puppet-rirs

https://github.com/jethrocarr/puppet-rirs

There are various parts of my website that I want to be publicly reachable, such as the WordPress login/admin sections, but at the same time I also don’t want them accessible by any muppet with a bot to try and break their way in.

I could put up a portal of some kind, but this then breaks stuff like apps that want to talk with those endpoints since they can’t handle the authentication steps. What I can do, is setup a GeoIP rule that restricts access to the sections to the countries I’m actually in, which is generally just NZ or AU, to dramatically reduce the amount of noise and attempts people send my way, especially given most of the attacks come from more questionable countries or service providers.

I started doing this with mod_geoip2, but it’s honestly a buggy POS and it really doesn’t work properly if you have both IPv4 and IPv6 connections (one or another is OK). Plus it doesn’t help me for applications that support IP ACLs, but don’t offer a specific GeoIP plugin.

So instead of using GeoIP, I’ve written a custom Puppet function that pulls down the IP assignment lists from the various Regional Internet Registries and generate IP/CIDR lists for both IPv4 and IPv6 on a per-country basis.

I then use those lists to populate configurations like Apache, but it’s also quite possible to use it for other purposes such as iptables firewalling since the generated lists can be turned into Puppet resources. To keep performance sane, I cache the processed output for 24 hours and merge any continuous assignment blocks.

Basically, it’s GeoIP for Puppet with support for anything Puppet can configure. :-)

 

puppet-digitalocean

https://github.com/jethrocarr/puppet-digitalocean

Provides a fact which exposes details from the Digital Ocean instance API about the instance – similar to how you get values automatically about Amazon EC2 systems.

 

puppet-initfact

https://github.com/jethrocarr/puppet-initfact

The great thing about the open source world is how we can never agree so we end up with a proliferation of tools doing the same job. Even init systems are not immune, with anything tha intends to run on the major Linux distributions needing to support systemd, Upstart and SysVinit at least for the next few years.

Unfortunately the way that I see most Puppet module authors “deal” with this is that they simply write an init config/file that suits their distribution of choice and conveniently forget the other distributions. The number of times I’ve come across Puppet modules that claim support for Red Hat and Amazon Linux but only ship an Upstart file…. >:-(

Part of the issue is that it’s a pain to even figure out what distribution should be using what type of init configuration. So to solve this, I’ve written a custom Fact called “initsystem” which exposes the primary/best init system on the specific system it’s running on.

It operates in two modes – there is a curated list for specific known systems and then fallback to automatic detection where we don’t have a specific curated result handy.

It supports (or should) all major Linux distributions & derivatives plus FreeBSD and MacOS. Pull requests for others welcome, could do with more BSD support plus maybe even support for Windows if you’re feeling brave.

 

puppet-yas3fs

https://github.com/pcfens/puppet-yas3fs/commit/27af462f1ce2fe0610012a508236062e65017b5f

Not my module, but I recently submitted a PR to it (subsequently merged) which introduces support for a number of different distributions via use of my initfact module so it should now run on most distributions rather than just Ubuntu.

If you’re not familiar with yas3fs, it’s a FUSE driver that turns S3+SNS+SQS into a shared filesystem between multiple servers. Ideal for dealing with legacy applications that demand state on disk, but don’t require high I/O performance, I’m in the process of doing a proof-of-concept with it and it looks like it should work OK for low activity sites such as WordPress, although with no locking I’d advise against putting MySQL on it anytime soon :-)

 

These modules can all be found on GitHub, as well as the Puppet Forge. Hopefully someone other than myself finds them useful. :-)

Baking images with Packer & Pupistry

One of the common issues when building modern infrastructure-as-code style systems is that whilst automation is great, it also has a habit of failing at the worst possible time. There’s nothing quite like the fun of trying to autoscale only to find that a newer version of a package breaks compatibility or the repository mirror or Puppet master has gone offline breaking the whole carefully tuned process.

Naturally this is an issue. And whilst I’ve seen some organisations simply ignore the issue and place trust in their repos and configuration management servers, I’m also too pessimistic about technology to trust numerous components for any mission critical applications.

Fortunately there is a solution – we can bake a machine image that has all the applications and configuration pre-applied, so that autoscaling has no third party dependencies (or as close to no dependencies as we can get).

Baking has negative connotations of the bad old days when engineers would assemble custom machine images by hand and then copy them to build new systems, but it doesn’t have to be that way. We can still respect infrastructure-as-code principals and use modern tools like Puppet and Packer to reliably build consistent images as needed.

These images could be as simple as a base AMI image for Amazon AWS which includes the stock OS image plus your Puppet setup. Or they could be as complex as a fully configured and provisioned application server ready-to-go at the first boot.

To make baking images easier, I’ve added support for generating Packer templates pre-loaded with bootstrap data into Pupistry, making it quick and easy to get started. Here’s how you can use it:

Assumptions/prerequisites:

  • You’ve already got Pupistry setup and functional (No? Read the tutorial here)
  • You’ve installed the third party Packer utility.
  • You have an Amazon AWS account for doing the AMI build. Note that Packer isn’t exclusive to Amazon, so you can also use the same technique with other providers including Digital Ocean and OpenStack – but you’ll have to write your own template.

First we can list what Packer templates are available with Pupistry. If the OS/platform of your choice isn’t included, it’s not particularly hard to add it – these are mostly intended to provide a good starting point for customising your own.

pupistry packer

Screen Shot 2015-05-31 at 23.57.20

We can select a template with –template NAME and also pass the resulting output to a file with –file NAME.  The following will build an Amazon Linux template pre-loaded with Pupistry and the default manifest applied:

pupistry packer --template aws_amazon-any --file output.json

Screen Shot 2015-06-01 at 00.00.01

The generated template is a JSON file that includes various instructions to Packer on how to build the image, as well as the bootstrap data that can also be generated independently with pupistry bootstrap. Various variables can be tweaked, we can export out the variables available and see their default settings with:

packer inspect output.json

Screen Shot 2015-06-01 at 00.02.00

You can see here that we must set a VPC ID and Subnet ID – this is because they differ per AWS account and need to be provided. (Side note: technically you can do EC2 classic with Packer and avoid this, but the VPC instance types like t2 are cheaper to run… and we like cheap :-).

The AWS Region and AWS AMI values are interlinked. If you choose to build for a different region, eg us-west-1, you will need to lookup the appropriate AMI ID for that region and change both the aws_ami and aws_region variables when you bake your image. For some reason Amazon chose to make their AMI IDs specific to a particular region which really does make life a bit more difficult than it really needs to be. :-(

The hostname is worth noting. By default we set it to “packer” so you can target your manifests to handle it specifically, but you could make this anything you wanted such as a particular machine or application type. When using the sample puppet repo that ships by default with Pupistry, we have defined specific configuration to run on the Packer built images:

Screen Shot 2015-06-01 at 00.09.08

Assuming we are happy with the defaults, we just have to set the VPC and Subnet IDs to launch the current image in ap-southeast-2.

packer build \
 -var 'aws_vpc_id=vpc-example' \
 -var 'aws_subnet_id=subnet-example' \
 output.json

As soon as we kick off, we can see that Packer has built a machine in our AWS account to use for the image generation process.

Screen Shot 2015-06-01 at 00.13.53

 

It can take up to a minute for the machine to become available via SSH. Once this happens, Packer opens a connection and starts to feed in the bootstrap commands that have been added into the template by Pupistry.

Screen Shot 2015-06-01 at 00.14.23

This process can take a number of minutes – remember you’re having to install all the various OS updates and then packages and dependencies needed to run Puppet and of course Pupistry itself.

Once all the dependencies are done, Pupistry will run and provision the machine with your Puppet manifests and then return the ID of the AMI that has been generated:

Screen Shot 2015-06-01 at 00.31.57

 

We can see that Packer has now terminated our temporary machine:

Screen Shot 2015-06-01 at 00.22.28

And given us a shiny new AMI in return:

Screen Shot 2015-06-01 at 00.34.14

 

We can now use that AMI to launch a new machine and check out what Pupistry did. For convenience, there is a launch button on the AMI page that will build a new machine for the selected AMI, however you can also take the AMI ID and use it in CloudFormation, from the API or from the usual instance creation screen.

Connecting to the newly spun up instance using our fresh AMI, we can see that it has had the Pupistry rules for the packer node applied and we can also set that the daemon is configured and running in the background.

Except that it took less than 1 minute, rather than needing 5+ minutes to do all the usual updates and dependency installation. And there was no risk of a broken repository or package preventing the launch of our machine. If it was an application server, we could have preloaded it and thrown it right into an ELB within 1 minute after it starting up – that’s ideal for autoscaling!

Screen Shot 2015-06-01 at 00.38.28

Packer supports a number of different options and different providers, so don’t be afraid to pull it down and experiment. You can even write your own custom providers if needed.

Sure you could always just write a script that does all the same things as Packer for your cloud provider of choice, but Packer provides a solid framework for doing this stuff in a reliable and reproducible way saving you time and keeping complexity down.

Easy Lockscreen MacOS

Whilst MacOS is a pretty polished experience, there’s some really simple things that are stupidly hard sometimes such as getting the keybindings to work right for real keyboards or in this case, getting the screen to be lockable without sleeping the computer.

No matter what configuration I set in power management, the only MacOS keyboard combination that does anything for me (Command + Option + Eject/Power/F12) will not only put up the lock screen, but also immediately sleeps the computer, much to the dismay of any background network connections or audio.

One of the issues with MacOS is that for any issue there are several dubious software vendors offering you an app that “fixes” the issue with quality ranging from some excellent utilities all the way to outright dodgy Android/Windows-style crapware addons.

None of these look particularly good. Who the hell wants Android-style swipe unlock on a Mac??

None of these look particularly good. Who the hell wants Android-style swipe unlock on a Mac??

Naturally I’m not keen for some crappy third party app to do something as key as locking my workstation so went looking for the underlying way the screen gets locked. From my trawling I found that the following command executed as a normal user will trigger a sleep of the display, but not the whole machine:

pmset displaysleepnow

Turns out getting MacOS to execute some line of shell is disturbingly easy by using the Automator tool (Available in Applications -> Utilities) and creating a new Service.

Screen Shot 2015-05-26 at 23.47.59

Then add the Run Shell Script action from the Library of actions like below:

Screen Shot 2015-05-26 at 23.47.00

Save it with a logical name like “Lock Screen”. It gets saved into ~/Library/Services/ so in theory should be possible to easily copy it to other machines.

Once saved, your new service will become available to you in System Preferences -> Keyboard -> Shortcuts and will offer you the ability to set a keyboard shortcut.

Screen Shot 2015-05-26 at 23.50.37

And magic, it works. Command + Shift + L is a lot easier in my books than hot corners or clicking stupid menu items. Sadly you don’t have full flexibility of any key, but you should be able to get something that works for you.

 

For reference, here are my other settings windows. First the power management (Energy Saver) settings. I select “Prevent computer from sleeping automatically” to avoid any surprises when sleeping.

Screen Shot 2015-05-27 at 00.14.29

And secondly, your Security & Privacy settings should require a password after sleep/screen saver:

Screen Shot 2015-05-27 at 00.12.07

 

Tested on MacOS 10.10 Yosemite with pretty much a stock OS installation on an iMac 5k – I wouldn’t expect any variation by hardware, but YMMV (Your Mileage May Vary).

Setting up and using Pupistry

As mentioned in my previous post, I’ve been working on an application called Pupistry to help make masterless Puppet deployments a lot easier.

If you’re new to Pupistry, AWS, Git and Puppet, I’ve put together this short walk through on how to set up the S3 bucket (and IAM users), the Pupistry application, the Git repo for your Puppet code and building your first server using Pupistry’s bootstrap feature.

If you’re already an established power user of AWS, Git and Puppet, this might still be useful to flick through to see how Pupistry fits into the ecosystem, but a lot of this will be standard stuff for you. More technical details can be found on the application README.

Note that this guide is for Linux or MacOS users. I have no idea how you do this stuff on a Windows machine that lacks a standard unix shell.

 

1. Installation

Firstly we  need to install Pupistry on your computer. As a Ruby application, Pupistry is packaged as a handy Ruby gem and can be installed in the usual fashion.

sudo gem install pupistry
pupistry setup

01-installThe gem installs the application and any dependencies. We run `pupistry setup` in order to generate a template configuration file, but we will still need to edit it with specific settings. We’ll come back to that.

You’ll also need Puppet available on your computer to build the Pupistry artifacts. Install via the OS package manager, or with:

sudo gem install puppet

 

2. Setting up AWS S3 bucket & IAM accounts

We need to use an S3 bucket and IAM accounts with Pupistry. The S3 bucket is essentially a cloud-based object store/file server and the IAM accounts are logins that have tight permissions controls.

It’s a common mistake for new AWS users to use the root IAM account details for everything they do, but given that the IAM details will be present on all your servers, you probably want to have specialised accounts purely for Pupistry.

Firstly, make sure you have a functional installation of  the AWS CLI (the modern python one, not the outdated Java one). Amazon have detailed documentation on how to set it up for various platforms, refer to that for information.

Now you need to create:

  1. An S3 bucket. S3 buckets are like domain names -they have a global namespace across *all* AWS accounts. That means someone might already have a bucket name that you want to use, so you’ll need to choose something unique… and hope.
  2. An IAM account for read-only access which will be used by the servers running Pupistry.
  3. An IAM account for read-write access for your workstation to make changes.

To save you doing this all manually, Pupistry includes a CloudFormation template, which is basically a defined set of instructions for AWS to execute to build infrastructure, in our case, it will do all the above steps for you. :-)

Because of the need for a globally unique name, please replace “unique” with something unique to you.

wget https://raw.githubusercontent.com/jethrocarr/pupistry/master/resources/aws/cfn_pupistry_bucket_and_iam.template

aws cloudformation create-stack \
--capabilities CAPABILITY_IAM \
--template-body file://cfn_pupistry_bucket_and_iam.template \
--stack-name pupistry-resources-unique

Once the create-stack command is issued, you can poll the status of the stack, you need it to be in “CREATE_COMPLETE” state before you can continue.

aws cloudformation describe-stacks --query "Stacks[*].StackStatus" --stack-name pupistry-resources-unique

02-s3-setup-init

 

If something goes wrong and your stack status is an error eg “ROLLBACK”, the most likely cause is that you chose a non-unique bucket name. If you want easy debugging, login to the AWS web console and look at the event details of your stack. Once you determine and address the problem, you’ll need to delete & re-create the stack again.

04-s3-aws-cfn-gui

AWS’s web UI can make debugging CFN a bit easier to read than the CLI tools thanks to colour coding and it not all being in horrible JSON.

 

Once you have a CREATE_COMPLETE stack, you can then get the stack outputs, which tell you what has been built. These outputs we then pretty much copy & paste into pupistry’s configuration file.

aws cloudformation describe-stacks --query "Stacks[*].Outputs[*]" --stack-name pupistry-resources-unique

03-s3-setup-explain

Incase you’re wondering – yes, I have changed the above keys & secrets since doing this demo!! Never share your access and secret keys and it’s best to avoid committing them to any repo, even if private.

Save the output, you’ll need the details shortly when we configure Pupistry.

 

3. Setup your Puppetcode git repository

Optional: You can skip this step if you simply want to try Pupistry using the sample repo, but you’ll have to come back and do this step if you want to make changes to the example manifests.

We use the r10k workflow with Pupistry, which means you’ll need at least one Git repository called the Control Repo.

You’ll probably end up adding many more Git repositories as you grow your Puppet manifests, more information about how the r10rk workflow functions can be found here.

To make life easy, there is a sample repo to go with Pupistry that is a ready-to-go Control Repo for r10k, complete with Puppetfile defining what additional modules to pull in, a manifests/site.pp defining a basic example system and base Hiera configuration.

You can use any Git service, however for this walkthrough, we’ll use Bitbucket since it’s free to setup any number of private repos as their pricing model is on the number of people in a team and is free for under 5 people.

Github’s model of charging per-repo makes the r10k puppet workflow prohibitively expensive, since we need heaps of tiny repos, rather than a few large repos. Which is a shame, since Github has some nice features.

Head to https://bitbucket.org/ and create an account if you don’t already have one. We can use their handy import feature to make a copy of the sample public repo.

Select “Create Repository” and then click the “Import” in the top right corner of the window.

05-bitbucket-create

Now you just need to select “GitHub” as a source with the URL of https://github.com/jethrocarr/pupistry-samplepuppet.git and select a name for your new repo:

06-bitbucket-import

Once the import completes, it will look a bit like this:

07-bitbucket-done

The only computers that need to be able to access this repository is your workstation. The servers themselves never use any of the Git repos, since Pupistry packages up everything it needs into the artifact files.

Finally, if you’re new to Bitbucket, you probably want to import their key into your known hosts file, so Pupistry doesn’t throw errors trying to check out the repo:

ssh-keyscan bitbucket.org >> ~/.ssh/known_hosts

 

4. Configuring Pupistry

At this point we have the AWS S3 bucket, IAM accounts and the Git repo for our control repo in Bitbucket. We can now write the Pupistry configuration file and get started with the tool!

Open ~/.pupistry/settings.yaml with your preferred text editor:

vim ~/.pupistry/settings.yaml

09-config-edit

There are three main sections to configure in the file:

  1. General – We need to define the S3 bucket here. (For our walk though, we are leaving GPG signing disabled, it’s not mandatory and GPG is beyond the scope for this walkthrough):10-config-general
  2. Agent – These settings impact the servers that will be running Pupistry, but you need to set them on your workstation since Pupistry will test them for you and pre-seed the bootstrap data with the settings:11-config-agent
  3. Build – The settings that are used on your workstation to generate artifacts. If you create your own repository in Bitbucket, you need to change the puppetcode variable to the location of your data. If you skipped that step, just leave it on the default sample repo for testing purposes.12-config-use-bitbucket

Make sure you set BOTH the agent and the build section access_key_id and secret_access_key using the output from the CloudFormation build in step 2.

 

5. Building an artifact with Pupistry

Now we have our AWS resources, our control repository and our configuration – we can finally use Pupistry and build some servers!

pupistry build

13-pupistry-build

Since this our first artifact, there won’t be much use to running diff, however as part of diff Pupistry will verify your agent AWS credentials are correct, so it’s worth doing.

pupistry diff

14-pupistry-diff

We can now push our newly built artifact to S3 with:

pupistry push

15-pupistry-push

In regards to the GPG warning – Pupistry interacts with AWS via secure transport and the S3 bucket can only be accessed via authorised accounts, however the weakness is that if someone does manage to break into your account (because you stuck your AWS IAM credentials all over a blog post like a muppet), an attacker could replace the artifacts with malicious ones and exploit your servers.

If you do enable GPG, this becomes impossible, since only signed artifacts will be downloaded and executed by your servers – an invalid artifact will be refused. So it’s a good added security benefit and doesn’t require any special setup other than getting GPG onto your workstation and setting the ID of the private key in the Pupistry configuration file.

We’ve now got a valid artifact. The next step is building our first server with Pupistry!

 

6. Building a server with Pupistry

Because having to install Pupistry and configure it on every server you ever want to build wouldn’t be a lot of fun manually, Pupistry automates this for you and can generate bootstrap scripts for most popular platforms.

These scripts can be run as part of user data on most cloud providers including AWS and Digital Ocean, as well as cut & paste into the root shell of any running server, whether physical, virtual or cloud-based.

The bootstrap process works by:

  1. Using the default OS tools to download and install Pupistry
  2. Write Pupistry’s configuration file and optionally install the GPG public key to verify against.
  3. Runs Pupistry.
  4. Pupistry then pulls down the latest artifact and executes the Puppetcode.
  5. In the case of the sample repo, the Puppetcode includes the puppet-pupistry module. This modules does some clever stuff like setting up a pluginsync equalivent for master-less Puppet and installs a system service for the Pupistry daemon to keep it running in the background – just like the normal Puppet agent! This companion module is strongly recommended for all users.

You can get a list of supported platforms for bootstrap mode with:

pupistry bootstrap

Once you decide which one you’d like to install, you can do:

pupistry bootstrap --template NAME

16-pupistry-bootstrap

Pupistry cleverly fills in all the IAM account details and seeds the configuration file based on the settings defined on your workstation. If you want to change behaviours like disabling the daemon, change it in your build host config file and it will be reflected in the bootstrap file.

 

To test Pupistry you can use any server you want, but this walkthrough shows an example using Digital Ocean which is a very low cost cloud compute provider with a slick interface and much easier learning curve than AWS. You can sign up and use them here, shamelessly clicking on my referrer link so my hosting bill gets paid – but also get yourself $10 credit in the process. Sweetas bru!

Once you have setup/logged into your DigitalOcean account, you need to create a new droplet (their terminology for a VM – AWS uses “EC2 Instance”). It can be named anything you want and any size you want, although this walkthrough is tight and suggests the cheapest example :-)

18-digitalocean-create-droplet-1

 

Now it is possible to just boot the Digital Ocean droplet and then cut & paste the bootstrap script into the running machine, but like most cloud providers Digital Ocean supports a feature called User Data, where a script can be pasted to have it execute when the machine starts up.

19-digitalocean-create-droplet-2

AWS users can get their user data in base64 version as well by calling pupistry bootstrap with the –base64 parameter – handy if you want to copy & paste the user data into other files like CloudFormation stacks. Digital Ocean just takes it in plain text like above.

Make sure you use the right bootstrap data for the right distribution. There are variations between distributions and sometime even between versions, hence various different bootstrap scripts are provided for the major distributions. If you’re using something else/fringe, you might have to do some of your own debugging, so recommend testing with a major distribution first.

20-digitalocean-create-droplet-3

Once you create your droplet, Digital Ocean will go away for 30-60 seconds and build and launch the machine. Once you SSH into it, tail the system log to see the user data executing in the background as the system completes it’s inaugural startup. The bootstrap script echos all commands it’s running and output into syslog for debugging purposes.

21-digitalocean-connect-to-server

 

Watch the log file until you see the initial Puppet run take place. You’ll see Puppet output followed by Notice: Finished catalog run at some stage of the process. You’ll also see the Pupistry daemon has launched and is now running in the background checking for updates every minute.

21-initial-pupistry-run

If you got this far, you’ve just done a complete build and proven that Pupistry can run on your server without interruption – because of the user data feature, you can easily automate machine creation & pupistry run to complete build servers without ever needing to login – we only logged in here to see what was going on!

 

7. Using Pupistry regularly

To make rolling out Puppet changes quick and simply, Pupistry sets up a background daemon job via the puppet-pupistry companion module which installs init config for most distributions for systemd, upstart and sysvinit. You can check the daemon status and log output on systemd-era distributions with:

service pupistry status

21-pupistry-daemon-details

If you want to test changes, then you probably may want to stop the daemon whilst you do your testing. Or you can be *clever* and use branches in your control repo – Pupistry daemon defaults to the master branch.

When testing or not using the daemon, you can run Pupistry manually in the same way that you can run the Puppet agent manually:

pupistry apply

22-pupistry-manual

Play around with some of the commands you can do, for example:

Run and only show what would have been done:

pupistry apply --noop

Apply a specific branch (this will work with the sample repo):

pupistry apply --environment exampleofbranch

To learn more about what commands can be run in apply mode, run:

pupistry help apply

 

 

8. Making a change to your control repo

At this point, you have a fully working Pupistry setup that you can experiment with and try new things out. You will want to check out the repo from bitbucket with:

git clone <repo>

Screen Shot 2015-05-10 at 23.31.02

 

Your first change you might want to make is experimenting with changing some of the examples in your repository and pushing a new artifact:

23-custom-puppetcode-1

 

When Puppet runs, it reads the manifests/site.pp file first for any node configuration. We have a simple default node setup that takes some actions like using some notify resources to display messages to the user. Try changing one of these:

24-custom-puppetcode-2

Make a commit & push the change to Bitbucket, then build a new artifact:

25-custom-puppetcode-3

 

We can now see the diff command in action:

26-custom-puppetcode-4

 

If you’re happy with the changes, you can then push your new artifact to S3 and it will quickly deploy to your servers within the next minute if running the daemon.

27-custom-puppetcode-5

You can also run the Pupistry apply manually on your target server to see the new change:

28-custom-puppetcode-6

At this point you’ve been able to setup AWS, setup Git, setup Pupistry, build a server and push new Puppet manifests to it! You’re ready to begin your exciting adventure into master-less Puppet and automate all the things!

 

9. Cleanup

Hopefully you like Pupistry and are now hooked, but even if you do, you might want to cleanup everything you’ve just created as part of this walkthrough.

First you probably want to destroy your Digital Ocean Droplet so it doesn’t cost you any further money:

29-cleanup-digitialocean

If you want to keep continuing with Pupistry with your new Pupistry Bitbucket control repo and your AWS account you can, but if you want to purge them to clean up and start again:

Delete the BitBucket repo:

30-cleanup-bitbucket

Delete the AWS S3 bucket contents, then tear down the CloudFormation stack to delete the bucket and the users:

31-cleanup-aws

All done – you can re-run this tutorial from clean, or use your newfound knowledge to setup your proper production configuration.

 

Further Information

Hopefully you’ve found this walkthrough (and Pupistry) useful! Before getting started properly on your Pupistry adventure, please do read the full README.md and also my introducing Pupistry blog post.

 

Pupistry is a very new application, so if you find bugs please file an issue in the tracker, it’s also worth checking the tracker for any other known issues with Pupistry before getting started with it in production.

Pull requests for improved documentation, bug fixes or new features are always welcome.

If you are stuck and need support, please file it via an issue in the tracker. If your issue relates *directly* to a step in this tutorial, then you are welcome to post a comment below. I get too many emails, so please don’t email me asking for support on an issue as I’ll probably just discard it.

You may also find the following useful if you’re new to Puppet:

Remember that Pupistry advocates the use of masterless Puppet techniques which isn’t really properly supported by Puppetlabs, however generally Puppet modules will work just fine in master-less as well as master-full environments.

Puppet master is pretty standard, whereas Puppet masterless implementations differ all over the place since there’s no “proper” way of doing it. Pupistry hopefully fills this gap and will become a defacto standard for masterless over time.

 

 

Introducing Pupistry

I’ve recently been working to migrate my personal infrastructure from a very conventional and ageing 8 year old colocation server to a new cloud-based approach.

As part of this migration I’m simplifying what I have down to the fewest possible services and offloading a number of them to best-of-breed cloud SaaS providers.

Of course I’m still going to have a few servers for running various applications where it makes the most sense, but ideally it will only be a handful of small virtual machines and a bunch of development machines that I can spin up on demand using cloud providers like AWS or Digital Ocean, only paying for what I use.

 

The Puppet Master Problem

To make this manageable I needed to use a configuration management system such as Puppet to allow the whole build process of new servers to be automated (and fast!). But running Puppet goes against my plan of as-simple-as-possible as it means running another server (the Puppet master). I could have gone for something like Ansible, but I dislike the agent-less approach and prefer to have a proper agent and being able to build boxes automatically such as when using autoscaling.

So I decided to use Puppet masterless. It’s completely possible to run Puppet against local manifest files and have it apply them, but there’s the annoying issue of how to get Puppet manifests to servers in the first place…. That tends to be left as an exercise to the reader and there’s various collections of hacks floating around on the web and major organisations seem to grow their own homespun tooling to address it.

Just getting a well functioning Puppet masterless setup took far longer than desired and it seems silly given that everyone doing Puppet masterless is going to have to do the same steps over and over again.

User-data is another case of stupidity with every organisation writing their own variation of what is basically the same thing – some lines of bash to get a newly launched Linux instance from nothingness to running Puppet and applying the manifests for that organisation. There’s got to be a better way.

 

The blessing and challenges of r10k

It gets even more complex when you take the use of r10k into account. r10k is a Puppet workflow solution that makes it easy to include various upstream Puppet modules and pin them to specific versions. It supports branches, so you can do clever things like tell one server to apply a specific new branch to test a change you’ve made before rolling it out to all your servers. In short, it’s fantastic and if you’re not using it with Puppet… you should be.

However using r10k does mean you need access to all the git repositories that are being included in your Puppetfile. This is generally dealt with by having the Puppet master run r10k and download all the git repos using a deployer key that grants it access to the repositories.

But this doesn’t work so well when you have to setup deployer access keys for every machine to be able to read every one of your git repositories. And if a machine is ever compromised, it needs to be changed for every repo and every server again which is hardly ideal.

r10k’s approach of allowing you to assemble various third party Puppet modules into a (hopefully) coherent collection of manifests is very powerful – grab modules from the Puppet forge, from Github or from some other third party, r10k doesn’t care it makes it all work.

But this has the major failing of essentially limiting your security to the trustworthyness of all the third parties you select.

In some cases the author is relatively unknown and could suddenly decide to start including malicious content, or in other cases the security of the platform providing the modules is at risk (eg Puppetforge doesn’t require any two-factor auth for module authors) and a malicious attacker could attack the platform in order to compromise thousands of machines.

Some organisations fix this by still using r10k but always forking any third party modules before using them, but this has the downside of increased manual overhead to regularly check for new updates to the forked repos and pulling them down. It’s worth it for a big enterprise, but not worth the hassle for my few personal systems.

The other issue aside from security is that if any one of these third party repos ever fails to download (eg repo was deleted), your server would fail to build. Nobody wants to find that someone chose to delete the GitHub repo you rely on just minutes before your production host autoscaled and failed to startup. :-(

 

 

Pupistry – the solution?

I wanted to fix the lack of a consistent robust approach to doing masterless Puppet and provide a good way to allow r10k to be used with masterless Puppet and so in my limited spare time over the past month I’ve been working on Pupistry. (Pupistry? puppet + artistry == Pupistry! Hopefully my solution is better than my naming “genius”…)

Pupistry is a solution for implementing reliable and secure masterless puppet deployments by taking Puppet modules assembled by r10k and generating compressed and signed archives for distribution to the masterless servers.

Pupistry builds on the functionality offered by the r10k workflow but rather than requiring the implementing of site-specific custom bootstrap and custom workflow mechanisms, Pupistry executes r10k, assembles the combined modules and then generates a compressed artifact file. It then optionally signs the artifact with GPG and finally uploads it into an Amazon S3 bucket along with a manifest file.

The masterless Puppet machines then runs Pupistry which checks for a new version of the manifest file. If there is, it downloads the new artifact and does an optional GPG validation before applying it and running Puppet. Pupistry ships with a daemon which means you can get the same convenience of  a standard Puppet master & agent setup and don’t need dodgy cronjobs everywhere.

To make life even easier, Pupistry will even spit out bootstrap files for your platform which sets up each server from scratch to install, configure and run Pupistry, so you don’t need to write line after line of poorly tested bash code to get your machines online.

It’s also FAST. It can check for a new manifest in under a second, much faster than Puppet master or r10k being run directly on the masterless server.

Because Pupistry is artifact based, you can be sure your servers will always build since all the Puppetcode is packaged up which is great for autoscaling – although you still want to use a tool like Packer to create an OS image with Pupistry pre-loaded to remove dependency and risk of Rubygems or a newer version of Pupistry failing.

 

Try it!

https://github.com/jethrocarr/pupistry

If this sounds up your street, please take a look at the documentation on the Github page above and also the introduction tutorial I’ve written on this blog to see what Pupistry can do and how to get started with it.

Pupistry is naturally brand new and at MVP stage, so if you find bugs please file an issue in the tracker. It’s also worth checking the tracker for any other known issues with Pupistry before getting started with it in production (because you’re racing to put this brand new unproven app into production right?).

Pull requests for improved documentation, bug fixes or new features are always welcome, as is beer. :-)

I intend to keep developing this for myself as it solves my masterless Puppet needs really nicely, but I’d love to see it become a more popular solution that others are using instead of spinning some home grown weirdness again and again.

I’ve put some time into making it easy to use (I hope) and also written bootstrap scripts for most popular Linux distributions and FreeBSD, but I’d love feedback good & bad. If you’re using Pupistry and love it, let me know! If you tried Pupistry but it had some limitation/issue that prevented you from adopting it, let me know what it was, I might be able to help. Better yet, if you find a blocker to using it, fix it and send me a pull request. :-)