Re: [RFC] KVM-Autotest: basic parallel test execution

Michael Goldish <mgoldish@xxxxxxxxxx> · Wed, 20 May 2009 19:15:22 -0400 (EDT)

----- "Ryan Harper" <ryanh@xxxxxxxxxx> wrote:

> * Michael Goldish <mgoldish@xxxxxxxxxx> [2009-05-17 09:50]:
> > Hi all,
> > 
> > We've recently implemented a very simple form of parallel test
> > execution into KVM-Autotest and we'd like some feedback on it. This
> > suggestion allows the user to manually assign tests to
> hosts/queues.
> > It also takes care of assigning different MAC address ranges to
> > hosts/queues. By 'queues' I mean parallel execution pipelines. Each
> > host has one or more queues. The number of queues is defined by the
> > user, and should reflect the capabilities of the host.
> > 
> > This implementation involves only minor modifications to the code
> > itself; most of the work is done in a new config file
> kvm_hosts.cfg,
> > which has the exact same format as kvm_tests.cfg. The new file
> > provides the framework with information about hosts/queues. The new
> > file is parsed after kvm_tests.cfg. The test sets (such as
> > 'nightly' and 'weekly'), previously defined at the end of
> > kvm_tests.cfg, should now be defined last, after kvm_hosts.cfg.
> > Test sets no longer select only the tests to execute, but also
> > where each test should be executed (i.e. on what host/queue).
> > 
> > The final result of parsing the config files is a list of tests,
> each
> > with its own 'hostname' and 'queue' parameters. Each host executes
> > only the tests whose 'hostname' parameter matches the current host,
> > and puts tests with different 'queue' values in parallel pipelines
> > of execution.
> > 
> > Ideally, the Autotest server should take care of assigning tests to
> > hosts automatically, but there are still a few technical
> difficulties
> > to be resolved before we can implement that. We're considering the
> > current suggestion as a temporary solution until a better one is
> > found.
> > 
> > Basically, the advantages are:
> > - allows the user full control over what tests run, and where/how
> they run
> > - takes care of assigning MAC address ranges to different
> hosts/queues (required for TAP networking)
> > - can be used from the server or with the client, which makes it
> relevant also for users who don't have an Autotest server installed
> > - involves only minor code changes (except for the config files)
> > - is pretty much the simplest possible solution (and simple is
> good)
> > 
> > Drawbacks:
> > - requires some initial work to be done by the user -- the user has
> to define exactly where each test should run
> > - test sets need to be modified when tests or hosts are
> added/removed, to include/exclude them
> 
> I took a slightly different approach.  The kvm_tests.cfg file already
> provides a dependency relationship between different tests.  I
> modified
> the main loop in the control file to walk the entire list of jobs and
> pull out any jobs that don't have any dependencies (ie, install
> tests).
> And then run N jobs in parallel from that list until it is exhausted;
> then store the results.  Then loop the over the remaining list of
> jobs again finding the jobs that can be run.

I like this approach. Looks like it's somewhere between the simple
static way and the fully automatic server way.

> On a larger multi core system, one might set the number of parallel
> jobs equal to the number of cores.

It makes sense to define 'threads' in the control file, because it's
definitely not a test param, and the control file is easiest to change from
the server. However, I wonder how different values of 'threads' can be
defined for different hosts (with different capabilities) from the server.

> I think this works well with using autoserv to farm out different
> kvm_tests.cfg to different machines.

But still we'd have to manually (or automatically) divide kvm_tests.cfg
between the hosts, right? Either that, or we send all hosts the same
kvm_tests.cfg.

> Attaching my stale patch just for comment.  Needs to be updated since
> I sat on this for a while.  There were a number of issues:
> 
> - kvm_log is a shared resource, fixed it up so parallel jobs can both
>   call it

Maybe I missed something, but I think job.parallel() forks, so why do we need
to change kvm_log? How is it shared?
I encountered no problems with it when I tried running tests in parallel, but
maybe I didn't look carefully enough.

> - vnc, redir and other network resources are shared, so, in
> kvm_tests.cfg file each job needs a parallel offset.

Or we can lock a file at the beginning of VM.create() and unlock it once the VM
is running (and has taken the ports assigned to it).

I have a patch that does this. I'll post it soon hopefully (after some more
testing).

> - in kvm_tests.cfg file need to define additional vm and nic objects,
>    one for each parallel threads.

But then how do you assign VMs to tests? The user doesn't know in advance which
thread takes each VM. Can you provide a simple example config file to illustrate
this?

I see that you reset 'vms' and 'main_vm' before running each test. I'm not sure
you're supposed to decide for the user what VMs to use. The user might need more
than one VM (for migration or some stress test), and the user may choose to set
'main_vm' to any of these VMs, not necessarily one with a name like 'vm1'.

My solution was to use separate environment files (the default one is 'env').
That way there can be several VMs with the same name, living in different
environments.
This can be achieved by passing the env filename to kvm_runtest_2 as a parameter,
e.g. job.run_test("kvm_runtest_2", params=%s, env_filename='env2', ...

Another possible solution is to give each guest a different VM name in kvm_tests.cfg.

However, I still think it's safest to avoid sharing an env file, because who knows
what happens inside python's shelve module and what sort of corruptions we might get.
In fact, the docs state that shelve doesn't handle parallel access at all -- it's up
to the user to maintain database integrity.

> Advantages:
>     - works a lot like the single threaded model does, and if threads=1
>     then it runs the same path
>     - config files don't change significantly, just some additional
>     VM objects at the top and some offset values
>     - transparent to an autoserv setup, autoserv would just need to
>     specify the kvm_tests.cfg file to each host.
>
> Disadvantages:
>     - the main loop waits for each group of parallel jobs to complete
>     before starting any more.  If somehow an install is mixed with a
>     reboot test, we'll wait around before starting more jobs
>     - probably a few more here, but I don't have them on the top of my
>     head.

I have a feeling the code can be simplified a little. I'll try to write something
to illustrate what I mean, but it might take me a while.

Also, it shouldn't be too hard to support several queues that run continuously,
without having to wait for long tests to complete. It can be done using pickle
or something similar, with a shared queue file that is accessed with locks
by the parallel tasks.

In any case, we should see if this works well with the server. If it doesn't
(possibly because if we use a server control file we'll cover this functionality anyway),
then this can be a client-only solution.

One more question -- what is the purpose of the changes you made to job.py?
I know autotest already supports parallel execution, so I wonder what functionality
was missing.

Thanks,
Michael
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html