Monthly Archives: January 2011

New KVM Server :-D

As per my recent post about how I have more computers than a small country,  I’ve taken the step to build a new server to run at home for all my development, backup and VM storage.

I’ve managed to condense down my server racks of stuff into a big huge tower case, taking 10 hard drives and up to a EATX server motherboard – whilst it may be pretty big, it’s nowhere near as large as a small data centre is. ;-)

The shiny black monolith of awesomeness

Hardware specifications are:

  • ASUSTek M4A78T-E Motherboard
  • AMD Phenon II X4 810 CPU (4 cores, single socket)
  • 12GB DDR3 RAM (planning to boost to max of 16GB)
  • 4x 1TB 7200RPM SATA drives for archival/file storage. (RAID 6)
  • 6x 160GB 7200RPM SATA drives for virtual machine space. (RAID 5)
  • 2x 4x SATA port controller cards for PCIe-4x
  • Lian Li PC-A71F chassis + additional 4x 3.5″ hotswap chassis.
  • Vantec Ion2+ 600W modular PSU
  • NexStar SATA docking bay + 2x 2TB 5600RPM SATA drives for external offsite backup purposes.

Software Specifications are:

  • RHEL 6 Beta x86_64 (yes, you heard correctly – running beta + jethro hax to get newer version of KVM, with plans to upgrade to CentOS 6 once released)
  • Full disk encryption across all drives to prevent data theft should physical access be compromised.
  • KVM virtualisation -All my previous systems have been Xen, but with the newer hardware I had an opportunity to upgrade to KVM – which is great, I’m finding it far less buggy than Xen has ever been.

Cool Stuff:

  • I’ve been running a number of performance tests using bonnie++ which I will post later this week – or maybe next week due to time pressures – comparing the different RAID levels and disk encryption.
  • Aside from the silly side-mounted hard drives (more on this later) I’m loving the Lian Li case, they very rarely disappoint. The sleek black finish and the smooth minimalistic door on it really helps make it look sexy and awe-inspiring.
  • Yes, the RAM/CPU is a little lacking, plan is to upgrade the MB, CPU and RAM to a bigger (maybe server dual socket) board later this year or early 2012.
  • The whole system even with the disks and fans spinning along at a reasonable load is quiet enough for me to sleep with easily. Although, having said that, I’ll sleep through almost anything. ;-)
  • Use of the multiple 160GB drives is in order to boost the I/O performance of VM disk operations by spreading load across a large number of spindles.

Not-so-cool Stuff:

  • The 16GB memory limit is going to be a pain, I may have to replace the MB sooner than desired.
  • I used up all my PCIe 16x slots in order to fit both PCIe 4x SATA controllers, so I’ve lost the ability to stick more video cards or other I/O controllers – need more PCIe 4x slots in my next motherboard.
  • The dust filters on this version of the case appear to be more awkward to remove whilst running, unlike some of their past models.
  • The side mounting hard drive case makes it difficult to close the case sides without hitting power cables or SATA cables…. using 90′ angled connectors helped for SATA data cables, but the SATA power cables are still being annoying.

Server nudity for all you geek perves out there!

I’ll have some more blog posts over the next week or so (even with pre LCA chaos) to detail some of the things I’ve learnt about Xen to KVM migration and other useful bits relating to virtualisation on RHEL 6.

Happy Birthday to me!

<?php

for ($i=0; $i <= 22; $i++)
{
    print "Jethro is oo";
    
    for ($j=0; $j < $i; $j++)
    {
        print "ooo";
    }
    
   print "old\n";
}

if ($i == 88)
{
    die("horrible horrible death");
}
else
{
    print "Live long, and prosper\n";
}

?>

The OOM killer is a nasty nasty bully

As part of my two weeks of annual leave, I’ve been making good of the spare time to work on upgrading a lot of my servers, adjusting configurations and performing a large shuffle of virtual machines between some of the hosts I have in different data centers.

As part of this work, I’ve been upgrading what was previously a DR-only host to run as full production after some nice memory and disk upgrades.

Unfortunately I ran into the beloved “Memory squeeze in netback driver” bug as per Xensource bug 762.

This delightful bug leads to a situation where although the server has about 8GB of available memory, Xen runs out of memory for networking to the VMs after a certain number of guests are started.

It’s a known fault with something to do with the Xen dom memory ballooning – one workaround is to force the domain to a certain memory size – easy enough to do, one change in the bootloader and another in the xen configuration files.

However I had to be clever. I thought to myself “Why not just tell the Xen dom to just set the memory now using xm mem-set command and save a reboot?”. Sadly my brilliant idea didn’t extend to checking how much memory the host was actually using….

Since it had been running for a while, a few processes had decided to take advantage of the additional memory and didn’t take kindly to having to fit into the new size, promptly consuming the allocated 256MB plus the swap space on the host.

If you’ve never exhausted a Linux box of memory, what happens next is never fun – Essentially the kernel invokes the Out Of Memory killer, which goes and kills of processes that it thinks are most deserving of being terminated to free resources.

Whilst this sounds like a smart feature, the OOM killer isn’t actually that smart and can do some undesirable activities – in this case, it went and terminated almost all the processes on the server, including both cron and SSH in an attempt to free memory.

I had setup a script to automatically restart the server should another remote server be unable to establish an SSH connection after 10mins whilst working on the changes just-in-case I did something silly and killed networking, however with cron terminated, this script isn’t getting executed.

So I now have a box that can do nothing other than ping, located in a data centre requiring a technician to power cycle it – the nightmare of any sysadmin. :-(

These situations are pretty rare these days thanks to most workloads being inside virtual machines or on servers with lights out management, but they still happen from time to time sadly. :-(

This bug is also one of the reasons why I’m really enjoying KVM on RHEL 6 over Xen on RHEL 5, so far it appears far more stable, less buggy and generally less “hacky” in nature.

Interestingly, only seen this bug on x86_64 xen hosts… many of the bugs I find with Xen seem to be architecture specific bugs and often don’t happen on i386 or vice-versa.

Sadly most of my production boxes still have another 12-24 months of life before I can justify upgrading them all to shiny new KVM hosts with LOM capabilities, I look forwards to when I can.

Meanwhile, I think some research into the OOM killer is needed, to find out how I can best configure it not to kill key processes.

The OOM killer isn’t entirely stupid, it does a number of metrics to try and make the best of a bad situation as per the documentation, but at the end of the day it’s just a really nasty tool for a problem you never ever want.