Tag Archives: server

Nginx, reverse proxies and DNS resolution

Nginx is a pretty awesome high performance web server and reverse proxy. It’s often used in conjunction with other HTTP servers such as Java/Tomcat and Ruby/Unicorn, as it allows static content to be served directly from disk by Nginx and for connections from slow clients to be queued and buffered by Nginx, rather than taking up time of the expensive/scarce application server worker processes.

 

A typical Nginx reverse proxy configuration to a single backend using proxy_pass to a local HTTP server application on port 8080 would look something like this:

server {
    ...
    proxy_pass http://localhost:8080
    ...
}

Another popular approach is having a defined upstream group (which can be used for multiple servers, or a single one if desired), for example:

upstream upstream-localhost {
    server localhost:8080;
}

server {
    ...
    proxy_pass http://upstream-localhost;
    ...
}

Generally this configuration works fine for most of our use cases – we typically have a 1-to-1 mapping between a backend application server and Nginx, so the configuration is very simple and reliable – any issues are usually with the backend application, rather than Nginx itself.

 

However on occasion there are times when it’s desirable to have Nginx talking to a backend on another server.

I recently implemented an OAuth2 gateway using Nginx-Lua, with the Nginx gateway doing the OAuth2 authentication in a small Lua module before passing the request through to the backend application. This configuration ran on a pair of bastion servers, which reverse proxy the request through to an Amazon ELB which load balances a number of application servers.

This works perfectly 95% of the time, but Amazon ELBs (even internal) have a tendency to change their IP addresses. Normally this doesn’t matter, since you never reference ELBs via their IP address and use their DNS name instead, but the default behaviour of the Nginx upstream and proxy modules is to resolve DNS at startup, but not to re-resolve DNS during the operation of the application.

This leads to a situation where the Amazon ELB IP address changes, Amazon update the DNS record, but Nginx never re-resolves the DNS record and stays pointing at the old IP address. Subsequently requests to the backend start failing once Amazon drops services from the old IP address.

This lack of re-resolution of backends is a known limitation/issue with Nginx. Thankfully there is a workaround to force Nginx to re-resolve addresses, as per this mailing list post by setting proxy_pass to a variable, which then forces re-resolution of the DNS names as Nginx treats variables differently to static configuration.

server {
    ...
    resolver 127.0.0.1;
    set $backend_upstream "http://dynamic.example.com:80";
    proxy_pass $backend_upstream;
    ...
}

 

A resolver (DNS server address) also needs to be configured. When using parametrised backends, a resolver must be configured in Nginx (it is unable to use the local OS resolver) and must point directly to a name server IP address.

If your name servers aren’t predictable, you could install something like dnsmasq to provide a local resolver on 127.0.0.1 which then forwards to the dynamically assigned name server, or take the approach of pulling the name server details from the host using something like Puppet Facts and then writing it into the configuration file when it’s generated on the host.

Nginx >= 1.1.9 will re-resolve DNS records based on their TTL, but it’s possible to override this with any value desired. To verify correct behaviour, tcpdump will quickly show whether re-resolution is working.

# tcpdump -i eth0 port 53
15:26:00.338503 IP nginx.example.com.53933 > 8.8.8.8.domain: 15459+ A? dynamic.example.com. (54)
15:26:00.342765 IP 8.8.8.8.domain > nginx.example.com.53933: 15459 1/0/0 A 10.1.1.1 (70)
...
15:26:52.958614 IP nginx.example.com.48673 > 8.8.8.8.domain: 63771+ A? dynamic.example.com. (54)
15:26:52.959142 IP 8.8.8.8.domain > nginx.example.com.48673: 63771 1/0/0 A 10.1.1.2 (70)

It’s a bit of an annoyance in an otherwise fantastic application, but as long as you are aware of the limitation, it is not too difficult to resolve the issue by a bit of configuration adjustment.

virt-viewer remote access tricks

Sometimes I need to connect directly to the console of my virtual machines, typically this is usually when working with development or experimental VMs where SSH/RDP/VNC isn’t working for whatever reason, or when I’m installing a new OS entirely.

To view virtual machines using libvirt (by both KVM or Xen), you use the virt-viewer command, this launches a window and establishes a VNC or SPICE connection into the virtual machine.

Historically I’ve just run this by SSHing into the virtual machine host and then using X11 forwarding to display the virtual machine window on my laptop. However this performs really badly on slow connections, particularly 3G where it’s almost unusable due to the design of X11 forwarding not being particularly efficient.

However virt-viewer has the capability to run locally and connect to a remote server, either directly to the libvirt daemon, or via an SSH tunnel. To do the latter, the following command will work for KVM (qemu) based hypervisors:

virt-viewer --connect qemu+ssh://user@host.example.com/system vmnamehere

With the above, you’ll have to enter your SSH password twice – first to establish the connection to the hypervisor and secondly to establish a tunnel to the VM’s VNC/SPICE session – you’ll probably quickly decide to get some SSH keys/certs setup to prevent annoyance. ;-)

This performs way faster than X11 forwarding, plus the UI of virt-manager stays much more responsive, including grabbing/ungrabbing of the local keyboard/mouse, even if the connection or server is lagging badly.

If you’re using Xen with libvirt, the following should work (I haven’t tested this, but based on the man page and some common sense):

virt-viewer --connect xen+ssh://user@host.example.com/ vmnamehere

If you wanted to open up the right ports on your server’s firewall and are sending all traffic via a secure connection (eg VPN), you can drop the +ssh and use –direct to connect directly to the hypervisor and VM without port forwarding via SSH.

How Jethro Geeks – IRL

A number of friends are always quite interested in how my personal IT infrastructure is put together, so I’m going to try and do one post a week ranging from physical environments, desktop, applications, server environments, monitoring and architecture.

Hopefully this is of interest to some readers – I’ll be upfront and advise that not everything is perfect in this setup, like any large environment there’s always ongoing upgrade projects, considering my environment is larger than some small ISPs it’s not surprising that there’s areas of poor design or legacy components, however I’ll try to be honest about these deficiencies and where I’m working to make improvements.

If you have questions or things you’d like to know my solution for, feel free to comment on any of the posts in this series. :-)

 

Today I’m examining my physical infrastructure, including my workstation and my servers.

After my move to Auckland, it’s changed a lot since last year and is now based around my laptop and gaming desktop primarily.

All the geekery, all the time

This is probably my most effective setup yet, the table was an excellent investment at about $100 off Trademe, with enough space for 2 workstations plus accessories in a really comfortable and accessible form factor.

 

My laptop is a Lenovo Thinkpad X201i, with an Intel Core i5 CPU, 8GB RAM, 120GB SSD and a 9-cell battery for long run time. It was running Fedora, but I recently shifted to Debian so I could upskill on the Debian variations some more, particularly around packaging.

I tend to dock it and use the external LCD mostly when at home, but it’s quite comfortable to use directly and I often do when out and about for work – I just find it’s easier to work on projects with the larger keyboard & screen so it usually lives on the dock when I’m coding.

This machine gets utterly hammered, I run this laptop 24×7, typically have to reboot about once every month or so, usually from issues resulting with a system crash from docking or suspend/resume – something I blame the crappy Lenovo BIOS for.

 

I have an older desktop running Windows XP for gaming, it’s a bit dated now with only a Core 2 Duo and 3GB RAM – kind of due for a replacement, but it still runs the games I want quite acceptably, so there’s been little pressure to replace – plus since I only really use it about once a week, it’s not high on my investment list compared to my laptop and servers.

Naturally, there are the IBM Model M keyboards for both systems, I love these keyboards more than anything (yes Lisa, more than anything <3 ) and I’m really going to be sad when I have to work in an office with other people again whom don’t share my love for loud clicky keyboards.

The desk is a bit messy ATM with several phones and routers lying about for some projects I’ve been working on, I’ll go through stages of extreme OCD tidiness to surrendering to the chaos… fundamentally I just have too much junk to go on it, so trying to downsize the amount of stuff I have. ;-)

 

Of course this is just my workstations – there’s a whole lot going on in the background with my two physical servers where the real stuff happens.

A couple years back, I had a lab with 2x 42U racks which I really miss. These days I’m running everything on two physical machines running Xen and KVM virtualisation for all services – it was just so expensive and difficult having the racks, I’d consider doing it again if I brought a house, but when renting it’s far better to be as mobile as possible.

The primary server is my colocation box which runs in a New Zealand data center owned by my current employer:

Forever Alone :'( [thanks to my colleagues for that]

It’s an IBM xseries 306m, with 3.0Ghz P4 CPU, 8GB of RAM and 2x 1TB enterprise grade SATA drives, running CentOS (RHEL clone). It’s not the fastest machine, but it’s more than speedy enough for running all my public-facing production facing services.

It’s a vendor box as it enabled me to have 3 yrs onsite NBD repair support for it, these days I have a complete hardware spare onsite since it’s too old to be supported by IBM any longer.

To provide security isolation and easier management, services are spread across a number of Xen virtual machines based on type and risk of attack, this machine runs around 8 virtual machines performing different publicly facing services including running my mail servers, web servers, VoIP, IM and more.

 

For anything not public-facing or critical production, there’s my secondary server, which is a “whitebox” custom build running a RHEL/CentOS/JethroHybrid with KVM for virtualisation, running from home.

Whilst I run this server 24×7, it’s not critical for daily life, so I’m able to shut it down for a day or so when moving house or internet providers and not lose my ability to function – having said that, an outage for more than a couple days does get annoying fast….

Mmmmmm my beautiful monolith

This attractive black monolith packs a quad core Phenom II CPU, custom cooler, 2x SATA controllers, 16GB RAM, 12x 1TB hard drives in full tower Lian Li case. (slightly out-of-date spec list)

I’m running RHEL with KVM on this server which allows me to run not just my internal production Linux servers, but also other platforms including Windows for development and testing purposes.

It exists to run a number of internal production services, file shares and all my development environment, including virtual Linux and Windows servers, virtual network appliances and other test systems.

These days it’s getting a bit loaded, I’m using about 1 CPU core for RAID and disk encryption and usually 2 cores for the regular VM operation, leaving about 1 core free for load fluctuations. At some point I’ll have to upgrade, in which case I’ll replace the M/B with a new one to take 32GB RAM and a hex-core processor (or maybe octo-core by then?).

 

To avoid nasty sudden poweroff issues, there’s an APC UPS keeping things running and a cheap LCD and ancient crappy PS/2 keyboard attached as a local console when needed.

It’s a pretty large full tower machine, so I except to be leaving it in NZ when I move overseas for a while as it’s just too hard to ship and try and move around with it – if I end up staying overseas for longer than originally planned, I may need to consider replacing both physical servers with a single colocated rackmount box to drop running costs and to solve the EOL status of the IBM xseries.

 

The little black box on the bookshelf with antennas is my Mikrotik Routerboard 493G, which provides wifi and wired networking for my flat, with a GigE connection into the server which does all the internet firewalling and routing.

Other than the Mikrotik, I don’t have much in the way of production networking equipment – all my other kit is purely development only and not always connected and a lot of the development kit I now run as VMs anyway.

 

Hopefully this is of some interest, I’ll aim to do one post a week about my infrastructure in different areas, so add to your RSS reader for future updates. :-)

Exchange, I will have my revenge!

It’s been a busy few weeks – straight after my visit to Christchurch I got stuck into the main migration phase of a new desktop and server deployment for one of our desktop customers.

It wasn’t a small bit of work, going from 20 independent 7-year old Windows XP desktops to new shiny Windows 7 desktops and moving from Scalix/Linux to Exchange/Win2008R2. It’s not the normal sort of project for me, usually I’ll be dealing with network systems and *nix servers, rather than Microsoft shops, but I had some free time and knew the customer site well so I ended up getting the project.

The deployment was mostly straightforwards, and I intended to blog about this in the near future, I honestly found some of the MS tech such as Active Directory quite nice and it’s interesting comparing the setup compared to what’s possible with the Linux environment.

However I still have no love for Microsoft Exchange, which has to be one of the most infuriating emails systems I’ve had to use. We ended up going with Exchange for this customer due to it working the easiest with their MS-centric environment and providing benefits such as ActiveSync for mobiles in future.

However with myself coming from a Linux background, having grown up with solid and easy to debug and monitor platforms like Sendmail, Postfix and Dovecot, Exchange is an exercise in obscure configuration and infuriating functionality.

To illustrate my point, I’m going to take you on a review of a fault we had with this new setup several days after switching over to the Exchange server…..

* * *

On one particular day, after several days of no problems, the Exchange server suddenly decided it didn’t want to email the upstream smarthost mail server.

The upstream server in question has both IPv4 and IPv6 addresses, something that you tend to want in the 21st century and it’s pretty rare that we have problems with it.

With Exchange 2010 and Windows Server 2008, both components have IPv6 enabled out-of-the-box – we don’t have IPv6 at this particular customer, since the ISP haven’t extended IPv6 beyond the core & colo networks, so we can’t allocate ranges to our customers using them at this stage.

For some unknown reason, the Windows server decided that it would make sense to try connecting to the smart host via IPv6 AAAA record, despite there being no actual upstream IPv6 connection. To make matters worse, it then decided the next most logical thing was to just fail, rather than falling back to the IPv4 A record.

The Windows experts assigned to look at this issue, decided the best solution was to “disable IPv6 in Exchange”, something I assumed meant “tell Exchange not to use IPv6 for smarthosts”.

With the issue resolved, no faults occurring and emails flowing, the issue was checked off as sorted. :-)

Later that night, the server was rebooted to make some changes to the underlying KVM  platform – however after rebooting, the Windows server didn’t come back up. Instead it was stuck for almost two hours at “Applying computer settings….” at boot – even once the login screen started, it would still take another 30mins before I could login.

This is the digital equivalent of watching paint dry.

After eventually logging in, the server revealed the cause of the slow startup as being the fault of the “microsoft.exchange.search.exsearch.exe” process running non-stop at 100% CPU.

After killing off that process to get some resemblance of a responsive system, it became apparent that a number of key Exchange components were also not running.

I waded through the maze that is event viewer, to find a number of Exchange errors, in particular one talking about being unable to connect to Active Directory LDAP, with an error of DSC_E_NO_SUITABLE_CDC (Error 0x80040a02, event 2114).

Every time I have to use event viewer I miss syslog, tail and grep even more.

Naturally the first response was to review what changes had been made on the server recently. After confirming that no updates had been made in the last couple of days, the only recent change was the IPv6 adjustment made by the Windows engineers earlier in the day.

Reading up on IPv6 support and Windows Server 2008, I came across this gem on microsoft.com:

"From Microsoft's perspective, IPv6 is a mandatory part of the Windows
operating system and it is enabled and included in standard Windows
service and application testing during the operating system development
process. Because Windows was designed specifically with IPv6 present,
Microsoft does not perform any testing to determine the effects of
disabling IPv6. If IPv6 is disabled on Windows Vista, Windows Server
2008, or later versions, some components will not function."

I then came across this blog post, from someone who had experienced the same error string, but with different cause. In his post, the author had a handy footnote:

"The biggest red herring I found when troubleshooting this one from
articles others had posted was related to IPv6. I see quite a few people
suggesting IPv6 is required for Exchange 2007 and 2010. This is NOT
true. As a matter of fact, if the server hosting Exchange 2007 or 2010
is a DC, then IPv6 must be enabled otherwise simply uncheck the checkbox
in TCP/IP properties on all connected interfaces. You don't need to
buggar with the registry to "really disable it"....just uncheck the
checkbox."

The customer’s Windows 2008 R2 server is responsible for both running Exchange 2010 as well as Active Directory

To resolve the smart host issues, the Windows team had disabled IPv6 altogether on the  interface, resulting in a situation where Exchange was unable to establish a connection to AD to get information needed to startup and run.

To resolve, I simply enabled IPv6 for the server and the Exchange processes correctly started themselves within 10 seconds or so as I watched in the Services utility.

This resolved the “Exchange isn’t functioning at all issue”, but still left me with the smarthost IPv6 issue. To work around the issue for now, I just set the smarthost in Exchange to use the IPv4 address, but will need a better fix long term.

With the issue resolved, some post-incident considerations:

  1. I’m starting to see more cases where a *lack* of IPv6 is actually causing more problems than the presence of it, particularly around mail servers.
  2. Exchange has some major architectural issues – I would love to know why an internal communication issue caused the search indexer process to go nuts at 100% CPU for hours.I’ve broken Linux boxes in terrible ways before, particularly with LDAP server outages leaving boxes unable to get any user information – they just error out slowly with timeouts, they don’t go and start chewing up 100% CPU. And I can drop them into a lower run level to fix and reboot within minutes instead of hours.
  3. I did a search and couldn’t find any official Microsoft best practice documentation for server 2008, nor did Windows Server warn the admin that disabling IPv6 would break key services.
  4. If Microsoft has published anything like this, it’s certainly not easy to find – microsoft.com is a complete searching disaster. And yes, whilst they have a “best practice analyzer tool”, it’s not really want I want as an admin, I want a doc I can review and check plans against.
  5. I’m seriously tempted to start adding surcharges for providing support for Microsoft platforms. :-/

* * *

Overall, Exchange certainly hasn’t put itself in my good books, issues like the IPv6 requirement are understandable, but the side effect of the search indexer going nuts on CPU makes no sense and it’s pretty concerning that the code isn’t just “oh I can’t connect, I’ll close/sleep till later”.

So sorry Microsoft, but you won’t see me becoming a Windows Server fanboy at any stage – my Linux Sendmail/Dovecot setup might not have some of Exchange’s flashier features, but it’s damn reliable, extremely easy to debug and logs in a clear and logical fashion. I can trust it to operate in a logical fashion and that’s worth more to me than the features.

DAViCal, awkward name, great features

A reoccurring theme of this blog is that I love to be able to use open standards and open source for storing and accessing my information – biggest example is of course IMAP for email, but I also use tools such as Mozilla Sync Server for self-reliant synchronization and backup of client device information, without using external cloud providers.

I’ve been a user of Evolution for almost a decade now – sometimes criticized as the “Outlook of Linux”, Evolution provides mail, calendering, contacts and todo lists to the GNOME desktop, with a pretty large but sometimes slightly buggy feature set. For me personally, it’s always done a great job and it’s my key business productivity tool.

I moved all my mail onto an IMAP server years ago, which makes it easy to shift clients if I ever need to – in my case, pretty much just needing to access mail from both my laptop and smart phone.

However this hasn’t been the case for other key data such as calendering and contacts. A few years ago, the open source calendering solutions available weren’t that well developed, and many clients suffered limitations such as read-only functionality.

Thankfully this has been changing – most clients (*glares angrily at Microsoft Outlook*) now support CalDAV and CardDAV quite reliably, which gives us an open standard that works across different programs, platforms and device types.

  • CalDAV is an open standard for the exchange of calendering and task/todo/memo information between a client and a server.
  • CardDAV is an open standard for the exchange of contact/address book information between a client and a server.

These two standards have a number of implementations, both open source and proprietary, of note is Apple Calender Server, which is Apple’s open source implementation; and DAViCal, an open source LAMP based server solution that is becoming quite popular.

I’ve used both solutions – my employer runs an Apple Calender Server after getting fed up at not having free/busy between engineers. Whilst we ended up running a MacOS server, the Linux ports have improved and there are resources for setting it up on a Linux or even BSD host.

Apple Calender Server works reasonably well with Evolution, I never have any issue booking events, however Evolution appears unable to accept or deny meeting requests, forcing me to go to the calender server web-interface which is actually pretty horrific.

I decided I wouldn’t use it for my own personal calendering, even if I went to the effort for porting it onto my Linux servers, it wouldn’t really be the solution I ideally want as it lacks a lot of features and isn’t as easy to configure as other Linux services.

Instead I had a look at DAViCal. It’s a feature-packed calendering and contacts application developed primarily by Morphoss in Wellington NZ, started by Andrew McMillian of ex-Catalyst IT fame.

Despite having an annoyingly tricky name to type (you try typing it for the 100th time at 3am without typoing on the capitalization!!), the software itself appears reliable and worked across a number of devices when I ran tests.

It’s not perfect, I have some issues with the user interface design, which very functional and effective, it’s not that intuitive to a new user, exposing far too many options to them at the beginning, ideally have a simple/advanced option so a user who just wants to add user calenders and do basic stuff can do so, then dig into more detailed ACLs, tokens, shared calenders, etc as needed.

Naturally it’s open source, so I should stop complaining and hack up some code to demonstrate what I think might be better. Maybe if people would stop stealing my car I’d have time to get something done. :-/

Main Screen Turn On! (Maybe some more 1-2-3 clear setup flows here would be nice, the wall of text is kind of offputting for visual people like myself).

Options! All the options!

The web-based interface is only for administration, there isn’t a web-based calender app provided with DAViCal, instead choose any CalDAV client you wish to use with it, whether it’s web based or client-side.

I haven’t given DAViCal’s feature set a full work out yet, at this stage I’ve just setup my personal calendar, contacts and todo list on both Evolution and my Android ICS phone but haven’t touched meeting requests, shared calenders and free/busy information.

Partly my testing is a bit limited since I’m only running Evolution 2.30.3 (Debian Stable) which is a little outdated and it looks like there’s some functionality missing/broken that might not be an issue any more.

On the mobile side, I’m using “aCal”, an open source Android application written by the DAViCal developer, providing a CalDAV calender, todo list and read-only contact/address book synchronization.

This now means I can add, edit and delete calender and task entries on either my Android phone or Linux laptop via evolution and have it propagate to the other device – although unfortunately this is based on polling, rather than push (looks like push is possible theoretically via an extension to the standard, works with iCal).

Tasks & calendar entries in a bright sunny UI

I can also get read-only copies of all my contact information from Evolution synced through to my phone, but sadly there isn’t support for editing contacts on the Android phone just yet.

I did also consider using LDAP for my address book entries, but CardDAV looks like a better designed solution, it’s very rare that I don’t see “LDAP” and “headache” mentioned in the same sentence, and this comes from someone maintaining and supporting LDAP enterprise environments…

Essentially the main problem with LDAP, is that there isn’t an exact standard for address entries, so what works for one client, might not work 100% for another, along with limited selection of decent applications for actually managing LDAP address databases.

Also some clients treat LDAP assuming it’s going to be a million+ record store and expose different UI compared to that of smaller address books which harm user experience (*glares at Evolution*).

aCal & Android ICS address book integration - note the uneditable edit screen on the right, read-only for now :-(

The other main issue with aCal is that it doesn’t sync with the native OS calender program, but instead provides it’s own. Digging through the documentation and mailing list, this appears to be due to the native application lacking support for some of the functionality needed for a proper CalDAV implementation, so a sync solution would leave certain features missing, although I’d still like the option.

Of course these are limitations of aCal, not DAViCal or the standards themselves – there are some other CalDav & CardDAV sync programs available in the Android market under non-open licenses, which you have the option of trying.

The nice thing about using standards is that you can have multiple vendors competing to make the best product/tool for their customer’s needs, not simply using lock-in to maintain/force a customer base. :-)

Overall DAViCal seems really nice and in my testing has been quite reliable – I’m now moving on to more rigorous testing and am in the process of migrating my calender and contacts information into it, once I start using it daily in the real world the true testing begins.

Keen to take a look at what options I have around exposing some information publicly, eg sharing schedule free/busy with friends on different servers.

DHCP, I/O and other virtualisation fun with KVM

I recently shifted from having two huge server racks down to having a single speedy home server running KVM virtual machines, with the intent of packaging all my servers – experimental, development, staging, etc, into a single reliable system which will reduce power and maintenance costs.

As part of this change, I went from having dedicated DHCP & DNS servers to having everything located onto the KVM host.

The design I’ve used, has the host OS running with minimal services – the host just runs KVM, OpenVPN, DHCP and a DNS caching nameserver – all other services run as guest VMs, with a virtual network for the guests and host to communicate over.

Guests run as DHCP clients – this makes it easy to assign or adjust addressing if needed and get their information from the host OS.

However this does mean you can’t get away with hammering the host too badly – for example, running an I/O and network intensive backup can cause some interesting problems when you also need the host for services, such as DHCP.

Take a look at the following log messages from a mostly idle VM – these were taken whilst another VM on the server was running a bonnie++ process to test performance:

Mar  6 10:18:06 virtguest dhclient: 5 bad udp checksums in 5 packets
Mar  6 10:18:27 virtguest dhclient: DHCPREQUEST on eth0 to 10.8.12.1 port 67
Mar  6 10:18:45 virtguest dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Mar  6 10:19:00 virtguest dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Mar  6 10:19:07 virtguest dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Mar  6 10:19:15 virtguest dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Mar  6 10:19:15 virtguest dhclient: 5 bad udp checksums in 5 packets

That’s some messed up stuff – what you’re seeing is that the guest VM is trying to renew the DHCP address with the host server – but the host is so sluggish with having to run the I/O intensive virtual machine that is actually corrupting or dropping the UDP packets, preventing the guest VM from renewing it’s address.

This of course raises the most important question: What happens if the guest can’t renew it’s IP address?


In this case, the Linux/CentOS 5 guest VM actually completely lost it’s IP address after a long period of DHCPREQUEST attempts, fell off the network entirely and caused my phone to go nuts with Nagios alerts.

Now of course in any sane production environment, nobody would be running a bonnie++ processes on a VM on an active server – however there’s some pretty key points still made here:

  • The isolation is a lie: Guests are only *somewhat* isolated from one another – one guest can still mess with another and effectively denial-of-service attack the other VMs by utilising all the available resources.
  • Guests can be jerks: Organisations running KVM (or some other systems) with untrusted guest VMs should carefully consider how they are going to monitor and protect the service from users running crazily resource intensive processes. (after all, there will be someone who wants to bonnie++ test their new VM simply for the lols).
  • cgroups to the rescue? Linux cgroups does have an I/O controller (blkio-cgroup) although whilst this controls read/write flow, it won’t restrict seeks which can also badly impact spinning rust based servers.
  • WTF DHCP? The approach of the guests simply dropping their DHCP address after losing contact with the DHCP server is a pretty bad design limitation – if the DHCP server is unreachable, it should keep the original address (of course if the “physical” ethernet connection dropped, that would be a different situation, and it should drop it’s address to match).
  • Also: I wonder what OSes/distributions have the above behavior?

I’m currenting running a number of bonnie++ tests on my KVM server and will have a blog post in the near future detailing these findings in more detail, I’m also planning to look into cgroups and other resource control and limiting functions and will report back on how these fare when you have guest VMs running heavy processes.

Overall it made my weekend of geekery that bit more exciting. :-D

CentOS, RHEL and future possibilities?

Those who know me will know that I’m a long term CentOS user – this actually started from my love of RHEL,  back in my early Linux using days when I was running Red Hat 8.0.

Whilst it made financial sense for Red Hat to switch to making their product only available in binary form for their customers, at the same time I can’t help but feel this has damaged the appeal of Red Hat for geeks like myself – I’m no longer able to setup friends, family or customers without the funds for RHEL with a quality, enterprise-grade free (as in beer + freedom) distribution.

I do wonder if this contributes to reduced market awareness in the small business space and also whether it reduces the likeliness of geeks like myself promoting the software – after all, if I can’t run RHEL myself, I’m likely to look at other distributions and options and end up promoting those.

With the lack of a free Red Hat enterprise-grade distribution, there are only a couple options for wanting a Red Hat-style experience:

  1. Fedora – the community developed distribution that forms the future base of RHEL, a fantastic distribution in it’s own right, but with only 12 months support per release, not suitable for server deployments.
  2. CentOS – the community free re-spin of RHEL with their trademarks removed to make it legally redistributable.

I’ve been using CentOS heavily on my servers and Fedora on my workstations, however there are a number of security delays that are concerning me about CentOS which have been recently highlighted in an LWN article.

Essentially, the core problem is that the latest version of CentOS is still only 5.5, whilst Red Hat have had 5.6 out for some time, with numerous security updates in it that have yet to be released for CentOS…..

Having systems vulnerable to known exploits with no upstream patches is always a pretty serious concern to any system administrator…. this is leading me to re-think my usage of CentOS and to consider whether I should consider other platforms.

I’ve never been a huge fan of Debian in the past, but I’m considering giving it a more detailed look and try – Debian has the advantages of a strong community (like Fedora has) but without the limitation of a short support life – although then again, Debian’s releases and support spans are a little less rigid than Red Hat, which is somewhat annoying.

There’s a few server platforms that come to mind – Ubuntu LTS, Mint Linux, Debian, Open/SuSe or of course, Fedora.

The other option is that I could spin my own distribution – based on the number of custom RPMs I already build, rebuilding Red Hat’s update packages for my own needs wouldn’t be too hard, but I really don’t want to get caught up in distribution maintenance for the next 5 years plus it’s not suitable for customer deployments – so even if I decide that a custom built system is best for me, it still doesn’t solve the “what do I install for others?” question.

Maybe I need Fedora LTS – long term support for specific versions of Fedora – 3 or 5 years would be wonderful and meet the needs of server administrators.

This was tried once before, with the Fedora Legacy project, but it didn’t last long – possibly the goal of supporting *all* the releases was too much to reasonably handle, so an approach of selection even/odd number releases only might make it more feasible – I know that I’d certain be willing to contribute.

Anyway, this is a late night concerned system administrator brain dump about the problem, interested in thoughts and comments from others here about distributions they use/would consider in the server environment.