Tag Archives: squid

Funny tasting Squid Resolver

squid_logoSquid is a very popular (and time tested) proxy server, it’s generally the go-to solution for a  proxy server in a *nix environment and is capable of providing general caching proxy services (including transparent) as well as more sophisticated reverse proxy solutions.

I recently ran into an issue where Squid was refusing to resolve some DNS addresses on our network – not an uncommon problem if using a public DNS server instead of an internal-only DNS server by mistake.

The first step was to check the nameservers listed in /etc/resolv.conf and make sure they were correct and returning valid results. In this case they were, all the name servers correctly resolved the address without any issue.

Next step was to check for specific configuration in Squid – some applications like Squid and Nginx allow you to specifically set their nameservers to something other than the contents of /etc/resolv.conf. In this case, there was no such configuration, in fact there was no configuration relating to DNS at all, meaning it would have to fall back to the operating system resolver.

Or does it? Generally Linux applications use the OS resolver which follows a set order to discover hosts defined explicitly in /etc/hosts, or tries the nameservers in /etc/resolv.conf. When either file is changed, the changes are reflected immediately on the next query for those addresses.

However Squid has it’s own approach. Unless it’s using DNS name servers specifically defined in it’s configuration file, instead of using the OS resolver it reads the configuration in /etc/resolv.conf as a once-off startup action, then continues to use the name servers that were defined for the lifetime of the process.

You can see this in the logs – at startup time Squid logs the servers it’s using in cache.log:

# grep nameserver /var/log/squid/cache.log
2014/07/02 11:57:37| Adding nameserver from /etc/resolv.conf

From this, the sequence of events is simple to figure out:

  1. A server was brought online, using a public DNS server that lacked some of our internal records.
  2. Squid was started up, reading in that DNS server from /etc/resolv.conf.
  3. The DNS server addresses were corrected and immediately resolved all other applications – but Squid stuck with the old address still, so continued to refuse the queries.

Resolving the immediate issue is as simple as restarting the Squid process to force it to pickup the new resolver settings. But what if your DNS server values could change at any future stage without warning?

If you’re using Puppet, you could use a custom fact (like this one) that exposes the current name servers on the system, then writes them into the Squid configuration file using the dns_nameservers configuration parameter and notifies the Squid service to reload on any change of the configuration file.

Or if your squid server is always going to be using a particular DNS server, regardless of what the host is using, you can simply set the dns_nameservers parameter in Squid to point to the desired servers.

ELBs & Corporate Proxies

Following on from yesterday’s ELB post, it’s worth noting that there’s another common scenario where you can trigger issues when accessing ELBs – many corporates enforce the use of an HTTP proxy for all outgoing traffic, sometimes transparently, other times less so.

Having a multi-AZ ELB and accessing from your own data center isn’t too much of an issue, if each host does it’s own DNS lookup, your hosts should roughly end up with a 50/50 split across AZs, as each one resolves it’s own DNS record.

But when a proxy is added to the mix, it breaks this, since proxies tend to do their own DNS lookups and cache the results for use by other clients. Testing with Squid showed that the DNS caching for Squid would favour a particular AZ and send all traffic to that one AZ, before then flipping to the other AZ when the DNS cache expired and was refreshed – in my case, every 5mins when the TTL expired.

I'm so sick of these motherfucking ELBs in this motherfucking cloud!

Go home Amazon, you’re drunk

If you’re using Squid, there’s little you can do to work around this – whilst you can adjust the Squid DNS caching times and approaches, short of disabling DNS caching and taking the performance hit of a DNS lookup for every new outbound request, you will always end up with load jumping between both AZs and causing havoc..

There’s a few workaround options:

  • Have multiple Squid proxies for your outbound traffic and load balance between then on your network, if you load balance outgoing traffic across 4+ different outbound Squid servers, your load should end up going to different AZs a *bit* more evenly – but still not guaranteed.
  • Create an internal ELB and access via your VPC link, allowing you to bypass your company’s outbound network proxies (as traffic routes via VPN or Direct Connect) – but then you’re paying for 2x ELBs – one external for end users and one internal for your own systems.
  • Replace the ELB with something actually useful (eg a Varnish or HA-Proxy instance in Amazon).
  • Get rid of the outbound proxies please! I could write a business case for it based on the amount of money I’ve seen proxies waste at so many different companies (hint: engineers time debugging issues is much more expensive than a couple GB extra data usage).
  • Gin.