Re: Job Control API [RFC]

Martin Kletzander <mkletzan@xxxxxxxxxx> · Mon, 26 May 2014 16:31:30 +0200

On Wed, May 21, 2014 at 11:13:06AM -0400, Tucker DiNapoli wrote:
 My name is Tucker DiNapoli and I am working on implementing job control
for
the storage driver for the google summer of code, the first step in doing
this
is creating and implementing a unified api for job control.

Currently there are several places where various aspects of job control are
implemented. The qemu and libxl drivers both contain internal
implementations
for job control on domain level jobs, with the qemu driver containing
support
for asynchronous jobs. There is also code in the libvirt.c file for running
block jobs and for querying domain jobs for information.

I would like for the job control api to be as independent of different
drivers
as possible since it will need to be used with storage drivers as well as
different virtualization drivers.

This definitely has to be independent in the code.  The less anyone
suffers with adding job control to other drivers, the better.

I imagine most of the api will revolve around a job object, and I think it's
important to decide what exactly should go in this job object.

This is a response from my first post on the mailing list and I think this
is a
good idea.

I'd _really_ like to see a common notion of a 'job id' that EVERY job
(whether domain-level, like migration; or block-level, like
commit/pull/rebase; or storage-level, like your new proposed storage
jobs) shares a common job namespace.  The job id is a positive integer.
Existing APIs will have to be retrofitted into the new job id notion;
any action that starts a long-running job that currently returns 0 on
success could be changed to return a positive job id; or we may need a
new API that queries the notion of the 'current job' (the job most
recently started) or even to set the 'current job' to a different job
id.  We'll need new API for querying a job by id, and to be most
portable, we should do job reporting via virTypedParameter
(virDomainGetJobInfo and virDomainGetBlockJobInfo are hardcoded into
returning a struct, so they are non-extensible; virDomainGetJobStats
almost did it right, except that the user has to call it twice, once to
learn how large to allocate, and again to pass in pre-allocated memory -
the ideal API would allocate the memory on a single call).

Currently there are separate types for block job info and job info, if
possible
I would like to merge these into a common job info type, and perhaps make
this
a part of the job object itself.

Anything that *can* be part of the job object itself, *should* be part
of it, however some things might require duplicating some info in
which case applying common sense should suffice.

Currently (in libxl and qemu) jobs are a part of the domain struct, I think
that jobs should be moved out of the domain struct instead using the idea of
job ids for domains to keep track of currently running jobs. I'm still new
to
libvirt so it this doesn't make sense and the idea of keeping job objects
attached to domains makes sense that's fine.

I think at the minimum each job object should contain: the id of the thread
running the job, the type of job, the job id, a condition variable to
coordinate jobs, and information about the job, either as a separate job
info
object or as part of the job object itself. The job should also contain a
reference to the domain or storage it is associated with.

I had an idea that job could have a list of domains/volumes/etc., but
those could relate to different (even not remotely connected)
drivers.  Would this be solved just with simple error "unknown job id"
when connected with another driver?

There are a few basic functions that should definitely be part of the api:
initialize a job, free a job, start a job, end a job, abort a job and get
info
on a job. It would be nice to be able to suspend a job and to change the
currently running job as well. That's what I can come up with, but I don't
have
much experience in libvirt so if there are other features that make sense
they
can be added as well.

All the features may make sense, but lots of them might not be
available when the underlying tool doesn't support it.  If it's a
simple qemu-img process, you can suspend it, you can even kill it, but
how gracefull it is when handling images read-write?  That's a
question...  Anyway, these things should probably be callbacks that
will be added by the particular driver when initializing the job and
handled there.

Finally (as far as I can think of right now) is the idea of parallel
jobs. Currently the qemu driver allows some jobs to be run in parallel by
allowing a job to be run asynchronously, this async job has a mask of job
types
associated with it that determine what types of regular jobs can be run
during
it. However I would like to allow an arbitrary number of jobs to be run at
once
(I'm not sure how useful this would be, but it seems best not to impose hard
limits on things). The easiest way to deal with this is to just ignore it
and
put the burden of synchronizing jobs on the drivers. This is obviously a bad
solution. Another way would be the way it is currently done it the qemu
driver,
have a mask of job types associated with each domain/storage which is
updated
when a job is started or ended which dictates what types of jobs can be
started. Regardless of how this is done it will require support from the
driver/domain/storage that each job is associated with.

And again, this can be decided by a mask or even a callback to the
driver as well.

Martin

Tucker DINapoli
Attachment:
signature.asc

Description: Digital signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list