I recently shifted from having two huge server racks down to having a single speedy home server running KVM virtual machines, with the intent of packaging all my servers – experimental, development, staging, etc, into a single reliable system which will reduce power and maintenance costs.
As part of this change, I went from having dedicated DHCP & DNS servers to having everything located onto the KVM host.
The design I’ve used, has the host OS running with minimal services – the host just runs KVM, OpenVPN, DHCP and a DNS caching nameserver – all other services run as guest VMs, with a virtual network for the guests and host to communicate over.
Guests run as DHCP clients – this makes it easy to assign or adjust addressing if needed and get their information from the host OS.
However this does mean you can’t get away with hammering the host too badly – for example, running an I/O and network intensive backup can cause some interesting problems when you also need the host for services, such as DHCP.
Take a look at the following log messages from a mostly idle VM – these were taken whilst another VM on the server was running a bonnie++ process to test performance:
Mar 6 10:18:06 virtguest dhclient: 5 bad udp checksums in 5 packets Mar 6 10:18:27 virtguest dhclient: DHCPREQUEST on eth0 to 10.8.12.1 port 67 Mar 6 10:18:45 virtguest dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67 Mar 6 10:19:00 virtguest dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67 Mar 6 10:19:07 virtguest dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67 Mar 6 10:19:15 virtguest dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67 Mar 6 10:19:15 virtguest dhclient: 5 bad udp checksums in 5 packets
That’s some messed up stuff – what you’re seeing is that the guest VM is trying to renew the DHCP address with the host server – but the host is so sluggish with having to run the I/O intensive virtual machine that is actually corrupting or dropping the UDP packets, preventing the guest VM from renewing it’s address.
This of course raises the most important question: What happens if the guest can’t renew it’s IP address?
In this case, the Linux/CentOS 5 guest VM actually completely lost it’s IP address after a long period of DHCPREQUEST attempts, fell off the network entirely and caused my phone to go nuts with Nagios alerts.
Now of course in any sane production environment, nobody would be running a bonnie++ processes on a VM on an active server – however there’s some pretty key points still made here:
- The isolation is a lie: Guests are only *somewhat* isolated from one another – one guest can still mess with another and effectively denial-of-service attack the other VMs by utilising all the available resources.
- Guests can be jerks: Organisations running KVM (or some other systems) with untrusted guest VMs should carefully consider how they are going to monitor and protect the service from users running crazily resource intensive processes. (after all, there will be someone who wants to bonnie++ test their new VM simply for the lols).
- cgroups to the rescue? Linux cgroups does have an I/O controller (blkio-cgroup) although whilst this controls read/write flow, it won’t restrict seeks which can also badly impact spinning rust based servers.
- WTF DHCP? The approach of the guests simply dropping their DHCP address after losing contact with the DHCP server is a pretty bad design limitation – if the DHCP server is unreachable, it should keep the original address (of course if the “physical” ethernet connection dropped, that would be a different situation, and it should drop it’s address to match).
- Also: I wonder what OSes/distributions have the above behavior?
I’m currenting running a number of bonnie++ tests on my KVM server and will have a blog post in the near future detailing these findings in more detail, I’m also planning to look into cgroups and other resource control and limiting functions and will report back on how these fare when you have guest VMs running heavy processes.
Overall it made my weekend of geekery that bit more exciting. :-D