Re: [RFC] KVM-Autotest: basic parallel test execution

Ryan Harper <ryanh@xxxxxxxxxx> · Thu, 21 May 2009 07:11:04 -0500

* Michael Goldish <mgoldish@xxxxxxxxxx> [2009-05-20 18:15]:
> ----- "Ryan Harper" <ryanh@xxxxxxxxxx> wrote:
> > * Michael Goldish <mgoldish@xxxxxxxxxx> [2009-05-17 09:50]:
> > 
> > I took a slightly different approach.  The kvm_tests.cfg file already
> > provides a dependency relationship between different tests.  I
> > modified
> > the main loop in the control file to walk the entire list of jobs and
> > pull out any jobs that don't have any dependencies (ie, install
> > tests).
> > And then run N jobs in parallel from that list until it is exhausted;
> > then store the results.  Then loop the over the remaining list of
> > jobs again finding the jobs that can be run.
> 
> I like this approach. Looks like it's somewhere between the simple
> static way and the fully automatic server way.
> 
> > On a larger multi core system, one might set the number of parallel
> > jobs equal to the number of cores.
> 
> It makes sense to define 'threads' in the control file, because it's
> definitely not a test param, and the control file is easiest to change from
> the server. However, I wonder how different values of 'threads' can be
> defined for different hosts (with different capabilities) from the server.

yeah, I'm pretty sure you can dynamically generate the control file via
autoserv.

> 
> > I think this works well with using autoserv to farm out different
> > kvm_tests.cfg to different machines.
> 
> But still we'd have to manually (or automatically) divide kvm_tests.cfg
> between the hosts, right? Either that, or we send all hosts the same
> kvm_tests.cfg.

Yes.  Manual is the easiest first step.

> > Attaching my stale patch just for comment.  Needs to be updated since
> > I sat on this for a while.  There were a number of issues:
> > 
> > - kvm_log is a shared resource, fixed it up so parallel jobs can both
> >   call it
> 
> Maybe I missed something, but I think job.parallel() forks, so why do we need
> to change kvm_log? How is it shared?
> I encountered no problems with it when I tried running tests in parallel, but
> maybe I didn't look carefully enough.

Looking through my modifications, I think I wanted the logger to include
the PID of the thread so one would know which job was writing what to
the log. 

> 
> > - vnc, redir and other network resources are shared, so, in
> > kvm_tests.cfg file each job needs a parallel offset.
> 
> Or we can lock a file at the beginning of VM.create() and unlock it once the VM
> is running (and has taken the ports assigned to it).
> 
> I have a patch that does this. I'll post it soon hopefully (after some more
> testing).

I think the offset method is simpler and no locks.

> 
> > - in kvm_tests.cfg file need to define additional vm and nic objects,
> >    one for each parallel threads.
> 
> But then how do you assign VMs to tests? The user doesn't know in advance which
> thread takes each VM. Can you provide a simple example config file to illustrate
> this?

I don't have my config I used around, but I doubt I was thinking that
much about this.

> 
> I see that you reset 'vms' and 'main_vm' before running each test. I'm not sure
> you're supposed to decide for the user what VMs to use. The user might need more
> than one VM (for migration or some stress test), and the user may choose to set
> 'main_vm' to any of these VMs, not necessarily one with a name like 'vm1'.

I definitely wasn't worried about allowing the user to pick these
assignments.  Mainly because I didn't see much value; probably because I
haven't written a sophosticated enough config for it to matter.  

> 
> My solution was to use separate environment files (the default one is 'env').
> That way there can be several VMs with the same name, living in different
> environments.

Yeah, I was trying to do that, but I failed to get that working.  That
should allow the same definitions to be duplicated between threads (like
main_vm name, etc) but since we'd still have to share port resources, I
think we can keep the same env and use an offset to parallelize access
to shared resources.

> This can be achieved by passing the env filename to kvm_runtest_2 as a parameter,
> e.g. job.run_test("kvm_runtest_2", params=%s, env_filename='env2', ...
> 
> Another possible solution is to give each guest a different VM name in kvm_tests.cfg.
> 
> However, I still think it's safest to avoid sharing an env file, because who knows
> what happens inside python's shelve module and what sort of corruptions we might get.
> In fact, the docs state that shelve doesn't handle parallel access at all -- it's up
> to the user to maintain database integrity.

OK.  I don't care that much that we use the same env file, I was just
looking for the simplest route to parallel jobs.  If we can safely use
the same env, I think that is the simplest method.  If not, then I think
we can work up a way to use a env per thread.

> 
> > Advantages:
> >     - works a lot like the single threaded model does, and if threads=1
> >     then it runs the same path
> >     - config files don't change significantly, just some additional
> >     VM objects at the top and some offset values
> >     - transparent to an autoserv setup, autoserv would just need to
> >     specify the kvm_tests.cfg file to each host.
> >
> > Disadvantages:
> >     - the main loop waits for each group of parallel jobs to complete
> >     before starting any more.  If somehow an install is mixed with a
> >     reboot test, we'll wait around before starting more jobs
> >     - probably a few more here, but I don't have them on the top of my
> >     head.
> 
> I have a feeling the code can be simplified a little. I'll try to write something
> to illustrate what I mean, but it might take me a while.

Indeed.  I'll see about cleaning this up and reposting in the meantime.

> 
> Also, it shouldn't be too hard to support several queues that run continuously,
> without having to wait for long tests to complete. It can be done using pickle
> or something similar, with a shared queue file that is accessed with locks
> by the parallel tasks.

Sounds fancy.  I'd like to see if we can get something simple in first.

> 
> In any case, we should see if this works well with the server. If it doesn't
> (possibly because if we use a server control file we'll cover this functionality anyway),
> then this can be a client-only solution.

Right, I don't have a proper autoserv setup at the moment so once I
repost the patch, it would be good for someone who does have this setup
to give it a spin.

> 
> One more question -- what is the purpose of the changes you made to job.py?
> I know autotest already supports parallel execution, so I wonder what functionality
> was missing.

The job.py in kvm-autotest only returns one return code for the parallel
job and in kvm_runtest_2, we need a return code for each thread so we
can mark if the tests was successful or not.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@xxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html