Setting up a Regression Test Farm for KVM

You have all upstream code, and you’re wondering if the internal Red Hat testing of KVM has a lot of internal ‘secret sauce’. No, it does not.

However, it is a complex endeavor, since there are lots of details involved. The farm setup and maintenance is not easy, given the large amounts of things that can fail (machines misbehave, network problems, git repos unavailable, so on and so forth). You have been warned.

With all that said, we’ll share what we have been doing. We did clean up our config files and extensions and released them upstream, together with this procedure, that we hope it will be useful to you guys. Also, this will cover KVM testing on a single host, as tests involving multiple hosts and Libvirt testing are a work in progress.

The basic steps are:

  1. Install an autotest server.
  2. Add machines to the server (test nodes). Those machines are the virt hosts that will be tested.
  3. Prepare the virt test jobs and schedule them.
  4. Set up cobbler in your environment so you can install hosts.
  5. Lots of trial and error until you get all little details sorted out.

We took years repeating all the steps above and perfecting the process, and we are willing to document it all to the best extent possible. I’m afraid however, that you’ll have to do your homework and adapt the procedure to your environment.

Some conventions

We are assuming you will install autotest to its default upstream location

/usr/local/autotest

Therefore a lot of paths referred here will have this as the base dir.

CLI vs Web UI

During this text, we’ll use frequently the terms CLI and Web UI.

By CLI we mean specifically the program:

/usr/local/autotest/cli/autotest-rpc-client

That is located in the autotest code checkout.

By Web UI, we mean the web interface of autotest, that can be accessed through

http://your-autotest-server.com/afe

Step 1 - Install an autotest server

Provided that you have internet on your test lab, this should be the easiest step. Pick up either a VM accessible in your lab, or a bare metal machine (it really doesn’t make a difference, we use a VM here). We’ll refer it from now on as the “Server” box.

The hard drive of the Server should hold enough room for test results. We found out that at least 250 GB holds data for more than 6 months, provided that QEMU doesn’t crash a lot.

You’ll follow the procedure described on

https://github.com/autotest/autotest/wiki/AutotestServerInstallRedHat

for Red Hat derivatives (such as Fedora and RHEL), and

https://github.com/autotest/autotest/wiki/AutotestServerInstall

for Debian derivatives (Debian, Ubuntu).

Note that using the install script referred right in the beginning of the documentation is the preferred method, and should work pretty well if you have internet on your lab. In case you don’t have internet there, you’d need to follow the instructions after the ‘installing with the script’ instructions. Let us know if you have any problems.

Step 2 - Add test machines

It should go without saying, but the machines you have to add have to be virtualization capable (support KVM).

You can add machines either by using the CLI or the Web UI, following the documentation:

https://github.com/autotest/autotest/wiki/ConfiguringHosts

If you don’t want to read that, I’ll try to write a quick howto.

Say you have two x86_64 hosts, one AMD and the other, Intel. Their hostnames are:

foo-amd.bazcorp.com foo-intel.bazcorp.com

I would create 2 labels, amd64 and intel64, I would also create a label to indicate the machines can be provisioned by cobbler. This is because you can tell autotest to run a job in any machine that matches a given label.

Logged as the autotest user:

$ /usr/local/autotest/cli/autotest-rpc-client label create -t amd64
Created label:
    'amd64'
$ /usr/local/autotest/cli/autotest-rpc-client label create -t intel64
Created label:
    'intel64'
$ /usr/local/autotest/cli/autotest-rpc-client label create hostprovisioning
Created label:
    'hostprovisioning'

Then I’d create each machine with the appropriate labels

$ /usr/local/autotest/cli/autotest-rpc-client host create -t amd64 -b hostprovisioning foo-amd.bazcorp.com
Added host:
    foo-amd.bazcorp.com

$ /usr/local/autotest/cli/autotest-rpc-client host create -t intel64 -b hostprovisioning foo-intel.bazcorp.com
Added host:
    foo-intel.bazcorp.com

Step 3 - Prepare the test jobs

Now you have to copy the plugin we have developed to extend the CLI to parse additional information for the virt jobs:

$ cp /usr/local/autotest/contrib/virt/site_job.py /usr/local/autotest/cli/

This should be enough to enable all the extra functionality.

You also need to copy the site-config.cfg file that we published as a reference, to the qemu config module:

$ cp /usr/local/autotest/contrib/virt/site-config.cfg /usr/local/autotest/client/tests/virt/qemu/cfg

Be aware that you need to read this file well, and later, configure it to your testing needs. We specially stress that you might want to create private git mirrors of the git repos you want to test, so you tax the upstream mirrors less, and have increased reliability.

Right now it is able to run regression testing on Fedora 18, and upstream kvm, provided that you have a cobbler instance functional, with a profile called f18-autotest-kvm that can be properly installed on your machines. Having that properly set up may open another can of worms.

One simple way to schedule the jobs, that we does use at our server, is to use cron to schedule daily testing jobs of the things you want to test. Here is an example that should work ‘out of the box’. Provided that you have an internal mailing list that you created with the purpose of receiving email notifications, called autotest-virt-jobs@foocorp.com, you can stick that on the crontab of the user autotest in the Server:

07 00 * * 1-7 /usr/local/autotest/cli/autotest-rpc-client job create -B never -a never -s -e autotest-virt-jobs@foocorp.com -f "/usr/local/autotest/contrib/virt/control.template" -T --timestamp -m '1*hostprovisioning' -x 'only qemu-git..sanity' "Upstream qemu.git sanity"
15 00 * * 1-7 /usr/local/autotest/cli/autotest-rpc-client job create -B never -a never -s -e autotest-virt-jobs@foocorp.com -f "/usr/local/autotest/contrib/virt/control.template" -T --timestamp -m '1*hostprovisioning' -x 'only f18..sanity' "Fedora 18 koji sanity"
07 01 * * 1-7 /usr/local/autotest/cli/autotest-rpc-client job create -B never -a never -s -e autotest-virt-jobs@foocorp.com -f "/usr/local/autotest/contrib/virt/control.template" -T --timestamp -m '1*hostprovisioning' -x 'only qemu-git..stable' "Upstream qemu.git stable"
15 01 * * 1-7 /usr/local/autotest/cli/autotest-rpc-client job create -B never -a never -s -e autotest-virt-jobs@foocorp.com -f "/usr/local/autotest/contrib/virt/control.template" -T --timestamp -m '1*hostprovisioning' -x 'only f18..stable' "Fedora 18 koji stable"

That should be enough to have one sanity and stable job for:

  • Fedora 18.
  • qemu.git userspace and kvm.git kernel.

What does these ‘stable’ and ‘sanity’ jobs do? In short:

  • Host OS (Fedora 18) installation through cobbler
  • Latest kernel for the Host OS installation (either the last kernel update build for fedora, or check out, compile and install kvm.git).

sanity job

  • Install latest Fedora 18 qemu-kvm, or compiles the latest qemu.git
  • Installs a VM with Fedora 18, boots, reboots, does simple, single host migration with all supported protocols
  • Takes about two hours. In fact, internally we test more guests, but they are not widely available (RHEL 6 and Windows 7), so we just replaced them with Fedora 18.

stable job

  • Same as above, but many more networking, timedrift and other tests

Setup cobbler to install hosts

Cobbler is an installation server, that control DHCP and/or PXE boot for your x86_64 bare metal virtualization hosts. You can learn how to set it up in the following resource:

https://github.com/cobbler/cobbler/wiki

You will set it up for simple installations, and you probably just need to import a Fedora 18 DVD into it, so it can be used to install your hosts. Following the import procedure, you’ll have a ‘profile’ created, which is a label that describes an OS that can be installed on your virtualization host. The label we chose, as already mentioned is f18-autotest-kvm. If you want to change that name, you’ll have to change site-config.cfg accordingly.

Also, you will have to add your test machines to your cobbler server, and will have to set up remote control (power on/off) for them.

The following is important:

The hostname of your machine in the autotest server has to be the name of your system in cobbler.

So, for the hypothetical example you’ll have to have set up systems with names foo-amd.bazcorp.com foo-intel.bazcorp.com in cobbler. That’s right, the ‘name’ of the system has to be the ‘hostname’. Otherwise, autotest will ask cobbler and cobbler will not know which machine autotest is taking about.

Other assumptions we have here:

1) We have a (read only, to avoid people deleting isos by mistake) NFS share that has the Fedora 18 DVD and other ISOS. The structure for the base dir could look something like:

.
|-- linux
|   `-- Fedora-18-x86_64-DVD.iso
`-- windows
    |-- en_windows_7_ultimate_x64_dvd_x15-65922.iso
    |-- virtio-win.iso
    `-- winutils.iso

This is just in case you are legally entitled to download and use Windows 7, for example.

2) We have another NFS share with space for backups of qcow2 images that got corrupted during testing, and you want people to analyze them. The structure would be:

.
|-- foo-amd
`-- bar-amd

That is, one directory for each host machine you have on your grid. Make sure they end up being properly configured in the kickstart.

Now here is one excerpt of kickstart with some of the gotchas we learned with experience. Some notes:

  • This is not a fully formed, functional kickstart, just in case you didn’t notice.
  • This is provided in the hopes you read it, understand it and adapt things to your needs. If you paste this into your kickstart and tell me it doesn’t work, I WILL silently ignore your email, and if you insist, your emails will be filtered out and go to the trash can.
install

text
reboot
lang en_US
keyboard us
rootpw --iscrypted [your-password]
firewall --disabled
selinux --disabled
timezone --utc America/New_York
firstboot --disable
services --enabled network --disabled NetworkManager
bootloader --location=mbr
ignoredisk --only-use=sda
zerombr
clearpart --all --drives=sda --initlabel
autopart
network --bootproto=dhcp --device=eth0 --onboot=on

%packages
@core
@development-libs
@development-tools
@virtualization
wget
dnsmasq
genisoimage
python-imaging
qemu-kvm-tools
gdb
iasl
libvirt
python-devel
ntpdate
gstreamer-plugins-good
gstreamer-python
dmidecode
popt-devel
libblkid-devel
pixman-devel
mtools
koji
tcpdump
bridge-utils
dosfstools
%end

%post

echo "[nfs-server-that-holds-iso-images]:[nfs-server-that-holds-iso-images]/base_path/iso /var/lib/virt_test/isos nfs ro,defaults 0 0" >> /etc/fstab
echo "[nfs-server-that-holds-iso-images]:[nfs-server-that-holds-iso-images]/base_path/steps_data  /var/lib/virt_test/steps_data nfs rw,nosuid,nodev,noatime,intr,hard,tcp 0 0" >> /etc/fstab
echo "[nfs-server-that-has-lots-of-space-for-backups]:/base_path/[dir-that-holds-this-hostname-backups] /var/lib/virt_test/images_archive nfs rw,nosuid,nodev,noatime,intr,hard,tcp 0 0" >> /etc/fstab
mkdir -p /var/lib/virt_test/isos
mkdir -p /var/lib/virt_test/steps_data
mkdir -p /var/lib/virt_test/images
mkdir -p /var/lib/virt_test/images_archive

mkdir --mode=700 /root/.ssh
echo 'ssh-dss [the-ssh-key-of-the-Server-autotest-user]' >> /root/.ssh/authorized_keys
chmod 600 /root/.ssh/authorized_keys

ntpdate [your-ntp-server]
hwclock --systohc

systemctl mask tmp.mount
%end

Painful trial and error process to adjust details

After all that, you can start running your test jobs and see what things will need to be fixed. You can run your jobs easily by logging into your Server, with the autotest user, and use the command:

$ /usr/local/autotest/cli/autotest-rpc-client job create -B never -a never -s -e autotest-virt-jobs@foocorp.com -f "/usr/local/autotest/contrib/virt/control.template" -T --timestamp -m '1*hostprovisioning' -x 'only f18..sanity' "Fedora 18 koji sanity"

As you might have guessed, this will schedule a Fedora 18 sanity job. So go through it and fix things step by step. If anything, you can take a look at this:

https://github.com/autotest/autotest/wiki/DiagnosingFailures

And see if it helps. You can also ask on the mailing list, but please, pretty please do your homework before you ask us to guide you through all the process step by step. This is already a step by step procedure.

All right, good luck, and happy testing!