On Wed, May 21, 2014 at 11:13:06AM -0400, Tucker DiNapoli wrote:
My name is Tucker DiNapoli and I am working on implementing job control for the storage driver for the google summer of code, the first step in doing this is creating and implementing a unified api for job control. Currently there are several places where various aspects of job control are implemented. The qemu and libxl drivers both contain internal implementations for job control on domain level jobs, with the qemu driver containing support for asynchronous jobs. There is also code in the libvirt.c file for running block jobs and for querying domain jobs for information. I would like for the job control api to be as independent of different drivers as possible since it will need to be used with storage drivers as well as different virtualization drivers.
This definitely has to be independent in the code. The less anyone suffers with adding job control to other drivers, the better.
I imagine most of the api will revolve around a job object, and I think it's important to decide what exactly should go in this job object. This is a response from my first post on the mailing list and I think this is a good idea.I'd _really_ like to see a common notion of a 'job id' that EVERY job (whether domain-level, like migration; or block-level, like commit/pull/rebase; or storage-level, like your new proposed storage jobs) shares a common job namespace. The job id is a positive integer. Existing APIs will have to be retrofitted into the new job id notion; any action that starts a long-running job that currently returns 0 on success could be changed to return a positive job id; or we may need a new API that queries the notion of the 'current job' (the job most recently started) or even to set the 'current job' to a different job id. We'll need new API for querying a job by id, and to be most portable, we should do job reporting via virTypedParameter (virDomainGetJobInfo and virDomainGetBlockJobInfo are hardcoded into returning a struct, so they are non-extensible; virDomainGetJobStats almost did it right, except that the user has to call it twice, once to learn how large to allocate, and again to pass in pre-allocated memory - the ideal API would allocate the memory on a single call).Currently there are separate types for block job info and job info, if possible I would like to merge these into a common job info type, and perhaps make this a part of the job object itself.
Anything that *can* be part of the job object itself, *should* be part of it, however some things might require duplicating some info in which case applying common sense should suffice.
Currently (in libxl and qemu) jobs are a part of the domain struct, I think that jobs should be moved out of the domain struct instead using the idea of job ids for domains to keep track of currently running jobs. I'm still new to libvirt so it this doesn't make sense and the idea of keeping job objects attached to domains makes sense that's fine. I think at the minimum each job object should contain: the id of the thread running the job, the type of job, the job id, a condition variable to coordinate jobs, and information about the job, either as a separate job info object or as part of the job object itself. The job should also contain a reference to the domain or storage it is associated with.
I had an idea that job could have a list of domains/volumes/etc., but those could relate to different (even not remotely connected) drivers. Would this be solved just with simple error "unknown job id" when connected with another driver?
There are a few basic functions that should definitely be part of the api: initialize a job, free a job, start a job, end a job, abort a job and get info on a job. It would be nice to be able to suspend a job and to change the currently running job as well. That's what I can come up with, but I don't have much experience in libvirt so if there are other features that make sense they can be added as well.
All the features may make sense, but lots of them might not be available when the underlying tool doesn't support it. If it's a simple qemu-img process, you can suspend it, you can even kill it, but how gracefull it is when handling images read-write? That's a question... Anyway, these things should probably be callbacks that will be added by the particular driver when initializing the job and handled there.
Finally (as far as I can think of right now) is the idea of parallel jobs. Currently the qemu driver allows some jobs to be run in parallel by allowing a job to be run asynchronously, this async job has a mask of job types associated with it that determine what types of regular jobs can be run during it. However I would like to allow an arbitrary number of jobs to be run at once (I'm not sure how useful this would be, but it seems best not to impose hard limits on things). The easiest way to deal with this is to just ignore it and put the burden of synchronizing jobs on the drivers. This is obviously a bad solution. Another way would be the way it is currently done it the qemu driver, have a mask of job types associated with each domain/storage which is updated when a job is started or ended which dictates what types of jobs can be started. Regardless of how this is done it will require support from the driver/domain/storage that each job is associated with.
And again, this can be decided by a mask or even a callback to the driver as well. Martin
Tucker DINapoli
Attachment:
signature.asc
Description: Digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list