Job Control API [RFC]

Tucker DiNapoli <t.dinapoli42@xxxxxxxxx> · Wed, 21 May 2014 11:13:06 -0400

  My name is Tucker DiNapoli and I am working on implementing job control for
the storage driver for the google summer of code, the first step in doing this
is creating and implementing a unified api for job control.

Currently there are several places where various aspects of job control are
implemented. The qemu and libxl drivers both contain internal implementations
for job control on domain level jobs, with the qemu driver containing support

for asynchronous jobs. There is also code in the libvirt.c file for running
block jobs and for querying domain jobs for information.

I would like for the job control api to be as independent of different drivers

as possible since it will need to be used with storage drivers as well as
different virtualization drivers.

I imagine most of the api will revolve around a job object, and I think it's
important to decide what exactly should go in this job object.

This is a response from my first post on the mailing list and I think this is a
good idea.

>>I'd _really_ like to see a common notion of a 'job id' that EVERY job
>>(whether domain-level, like migration; or block-level, like

>>commit/pull/rebase; or storage-level, like your new proposed storage
>>jobs) shares a common job namespace.  The job id is a positive integer.
>> Existing APIs will have to be retrofitted into the new job id notion;

>>any action that starts a long-running job that currently returns 0 on
>>success could be changed to return a positive job id; or we may need a
>>new API that queries the notion of the 'current job' (the job most

>>recently started) or even to set the 'current job' to a different job
>>id.  We'll need new API for querying a job by id, and to be most
>>portable, we should do job reporting via virTypedParameter

>>(virDomainGetJobInfo and virDomainGetBlockJobInfo are hardcoded into
>>returning a struct, so they are non-extensible; virDomainGetJobStats
>>almost did it right, except that the user has to call it twice, once to

>>learn how large to allocate, and again to pass in pre-allocated memory -
>>the ideal API would allocate the memory on a single call).

Currently there are separate types for block job info and job info, if possible

I would like to merge these into a common job info type, and perhaps make this
a part of the job object itself.

Currently (in libxl and qemu) jobs are a part of the domain struct, I think
that jobs should be moved out of the domain struct instead using the idea of

job ids for domains to keep track of currently running jobs. I'm still new to
libvirt so it this doesn't make sense and the idea of keeping job objects
attached to domains makes sense that's fine.

I think at the minimum each job object should contain: the id of the thread
running the job, the type of job, the job id, a condition variable to
coordinate jobs, and information about the job, either as a separate job info

object or as part of the job object itself. The job should also contain a
reference to the domain or storage it is associated with.

There are a few basic functions that should definitely be part of the api:
initialize a job, free a job, start a job, end a job, abort a job and get info

on a job. It would be nice to be able to suspend a job and to change the
currently running job as well. That's what I can come up with, but I don't have
much experience in libvirt so if there are other features that make sense they

can be added as well.

Finally (as far as I can think of right now) is the idea of parallel
jobs. Currently the qemu driver allows some jobs to be run in parallel by
allowing a job to be run asynchronously, this async job has a mask of job types

associated with it that determine what types of regular jobs can be run during
it. However I would like to allow an arbitrary number of jobs to be run at once
(I'm not sure how useful this would be, but it seems best not to impose hard

limits on things). The easiest way to deal with this is to just ignore it and
put the burden of synchronizing jobs on the drivers. This is obviously a bad
solution. Another way would be the way it is currently done it the qemu driver,

have a mask of job types associated with each domain/storage which is updated
when a job is started or ended which dictates what types of jobs can be
started. Regardless of how this is done it will require support from the

driver/domain/storage that each job is associated with.

Tucker DINapoli

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list