Tag Archives: ruby

Puppet facts, json and max nesting

I use Puppet for both business and pleasure and my work often involves writing custom Puppet facts to expose various bits of information.

Recently a fact I had written that worked on the development machines started throwing errors when run on our production machines:

Could not retrieve jethros_awesome_fact: nesting of 20 is too deep

 After digging around it turns out this relates to how many nested levels are inside JSON responses. By default Ruby enforces a maximum level of nesting, I guess to avoid parsing bad JSON or JSON deliberately structured to cause infinite looping.

My fact involved pulling JSON from a local application API and then providing various bits of data from the feed. In the development environments this worked without an issue, but the production systems returned a lot more information via the API feed and broke it.

The fix is pretty easy, just need to add the :max_nesting => false parameter when parsing the JSON – or set it to a different number of levels if you prefer that approach.

json         = JSON.parse(response.body, :max_nesting => false)

Ruby Net::HTTP & Proxies

I ran into a really annoying issue today with Ruby and the Net::HTTP class when trying to make requests out via the restrictive corporate proxy at the office.

The documentation states that “Net::HTTP will automatically create a proxy from the http_proxy environment variable if it is present.” however I was repeatedly seeing my connections fail and a tcpdump confirmed that they weren’t even attempting to transit the proxy server.

Turns out that this proxy transversal only takes place if Net::HTTP is invoked as an object, however if you invoke one of it’s methods directly it ignores the proxy environmentals entirely.

The following example application demonstrates the issue:

#!/usr/bin/env ruby

require 'net/http'

puts "Your proxy is #{ENV["http_proxy"]}"

puts "This will work with your proxy settings:"
uri       = URI('https://www.jethrocarr.com')
request   = Net::HTTP.new(uri.host, uri.port)
response  = request.get(uri)
puts response.code

puts "This won't:"
uri = URI('https://www.jethrocarr.com')
response = Net::HTTP.get_response(uri)
puts response.code

Which will give you something like:

Your proxy is http://ihateproxies.megacorp.com:8080
This will work with your proxy settings:
200
This won't:
/System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/net/http.rb:878:in `initialize': No route to host - connect(2) (Errno::EHOSTUNREACH)
    from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/net/http.rb:878:in `open'
    from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/net/http.rb:878:in `block in connect'
    from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/timeout.rb:52:in `timeout'
    from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/net/http.rb:877:in `connect'
    from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/net/http.rb:862:in `do_start'
    from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/net/http.rb:851:in `start'
    from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/net/http.rb:582:in `start'
    from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/net/http.rb:477:in `get_response'
    from ./proxyexample.rb:18:in `<main>'

Very annoying!

Exposing name servers with Puppet Facts

Carrying on from the last post, I needed a good reliable way to point my Nginx configuration at a DNS server to use for resolving backends. The issue is that I wanted my Puppet module to be portable across various environments, some which block outbound DNS traffic to external services and others where the networks may be redefined on a frequent basis and maintaining an accurate list of all the name servers would be difficult (eg the cloud).

I could have used dnsmasq to setup a localhost resolver, but when it comes to operational servers, simplicity is key – having yet another daemon that could crash or cause problems is never desirable if there’s a simpler way to solve the issue.

Instead I used Facter (sic), Puppet’s tool for exposing values pulled from the system into variables that can be used in your Puppet manifests or templates. The following custom fact is included in my Puppet module and is run before any configuration is applied to the host running my Nginx configuration:

#!/usr/bin/env ruby
#
# Returns a string with all the IPs of all configured nameservers on
# the server. Useful for including into applications such as Nginx.
#
# I live in mymodulenamehere/lib/facter/nameserver_list.rb
# 

Facter.add("nameserver_list") do
    setcode do
      nameserver = false

      # Find all the nameserver values in /etc/resolv.conf
      File.open("/etc/resolv.conf", "r").each_line do |line|
        if line =~ /^nameserver\s*(\S*)/
          if nameserver
            nameserver = nameserver + " " + $1
          else
            nameserver = $1
          end
        end
      end

      # If we can't get any result (bad host config?) default to a
      # public DNS server that is likely to be reachable.
      unless nameserver
        nameserver = '8.8.8.8'
      end

      nameserver
    end
end

On a system with a typically configured /etc/resolv.conf file such as:

search example.com
nameserver 192.168.0.1
nameserver 10.1.1.1

The fact will expose the nameservers in a space-delineated string such as:

# facter -p | grep 'nameserver_list'
nameserver_list => 192.168.0.1 10.1.1.1

I can then use the Fact inside my Puppet templates for Nginx to configure the resolver:

server {
    ...
    resolver <%= @nameserver_list %>;
    resolver_timeout 1s;
    ...
}

This works pretty well, but there are a couple things to watch out for:

  1. If the Fact fails to execute at all, your configuration will be broken. Having said that, it’s a very simple Fact and there’s not a lot that really could fail (eg no dependencies on other apps/non-standard resources).
  2. Linux hosts resolve DNS using the nameservers specified in the order in /etc/resolv.conf. If one fails, they move on and try the next. However Nginx differs, and just uses the list of provides nameservers in round-robin fashion. This is fine if your nameservers are all equals, but if some are more latent or less reliable than others, it could cause slight delays.
  3. You want to drop the resolver_timeout to 1 second, to ensure a failing nameserver doesn’t hold up re-resolution of DNS for too long. Remember that this re-resolution should only occur when the TTL of the DNS records for the backend has expired, so even if one DNS server is bad, it should have almost no impact to performance for your requests.
  4. Nginx isn’t going to pickup stuff in /etc/hosts using these resolvers. This should be common sense, but thought I better put that out there just-in-case.
  5. This Ruby could be better, but I’m not a dev and hacked it up in 15mins. The regex should probably also be improved to handle some of the more exotic /etc/resolv.confs that I’m sure people manage to write.

Nginx, reverse proxies and DNS resolution

Nginx is a pretty awesome high performance web server and reverse proxy. It’s often used in conjunction with other HTTP servers such as Java/Tomcat and Ruby/Unicorn, as it allows static content to be served directly from disk by Nginx and for connections from slow clients to be queued and buffered by Nginx, rather than taking up time of the expensive/scarce application server worker processes.

 

A typical Nginx reverse proxy configuration to a single backend using proxy_pass to a local HTTP server application on port 8080 would look something like this:

server {
    ...
    proxy_pass http://localhost:8080
    ...
}

Another popular approach is having a defined upstream group (which can be used for multiple servers, or a single one if desired), for example:

upstream upstream-localhost {
    server localhost:8080;
}

server {
    ...
    proxy_pass http://upstream-localhost;
    ...
}

Generally this configuration works fine for most of our use cases – we typically have a 1-to-1 mapping between a backend application server and Nginx, so the configuration is very simple and reliable – any issues are usually with the backend application, rather than Nginx itself.

 

However on occasion there are times when it’s desirable to have Nginx talking to a backend on another server.

I recently implemented an OAuth2 gateway using Nginx-Lua, with the Nginx gateway doing the OAuth2 authentication in a small Lua module before passing the request through to the backend application. This configuration ran on a pair of bastion servers, which reverse proxy the request through to an Amazon ELB which load balances a number of application servers.

This works perfectly 95% of the time, but Amazon ELBs (even internal) have a tendency to change their IP addresses. Normally this doesn’t matter, since you never reference ELBs via their IP address and use their DNS name instead, but the default behaviour of the Nginx upstream and proxy modules is to resolve DNS at startup, but not to re-resolve DNS during the operation of the application.

This leads to a situation where the Amazon ELB IP address changes, Amazon update the DNS record, but Nginx never re-resolves the DNS record and stays pointing at the old IP address. Subsequently requests to the backend start failing once Amazon drops services from the old IP address.

This lack of re-resolution of backends is a known limitation/issue with Nginx. Thankfully there is a workaround to force Nginx to re-resolve addresses, as per this mailing list post by setting proxy_pass to a variable, which then forces re-resolution of the DNS names as Nginx treats variables differently to static configuration.

server {
    ...
    resolver 127.0.0.1;
    set $backend_upstream "http://dynamic.example.com:80";
    proxy_pass $backend_upstream;
    ...
}

 

A resolver (DNS server address) also needs to be configured. When using parametrised backends, a resolver must be configured in Nginx (it is unable to use the local OS resolver) and must point directly to a name server IP address.

If your name servers aren’t predictable, you could install something like dnsmasq to provide a local resolver on 127.0.0.1 which then forwards to the dynamically assigned name server, or take the approach of pulling the name server details from the host using something like Puppet Facts and then writing it into the configuration file when it’s generated on the host.

Nginx >= 1.1.9 will re-resolve DNS records based on their TTL, but it’s possible to override this with any value desired. To verify correct behaviour, tcpdump will quickly show whether re-resolution is working.

# tcpdump -i eth0 port 53
15:26:00.338503 IP nginx.example.com.53933 > 8.8.8.8.domain: 15459+ A? dynamic.example.com. (54)
15:26:00.342765 IP 8.8.8.8.domain > nginx.example.com.53933: 15459 1/0/0 A 10.1.1.1 (70)
...
15:26:52.958614 IP nginx.example.com.48673 > 8.8.8.8.domain: 63771+ A? dynamic.example.com. (54)
15:26:52.959142 IP 8.8.8.8.domain > nginx.example.com.48673: 63771 1/0/0 A 10.1.1.2 (70)

It’s a bit of an annoyance in an otherwise fantastic application, but as long as you are aware of the limitation, it is not too difficult to resolve the issue by a bit of configuration adjustment.