On Mon, May 20, 2013 at 09:07:17PM -0600, Eric Blake wrote: > https://bugzilla.redhat.com/show_bug.cgi?id=965169 documents a > problem starting domains when cgroups are enabled; I was able > to reliably reproduce the race about 5% of the time when I added > hooks to domain startup by 3 seconds (as that seemed to be about > the length of time that qemu created and then closed a temporary > thread, probably related to aio handling of initially opening > a disk image). > > There are some inherent TOCTTOU races when moving tasks between > kernel cgroups, precisely because threads can be created or > completed in the window between when we read a thread id from the > source and when we write to the destination. As the goal of > virCgroupMoveTask is merely to move ALL tasks into the new > cgroup, it is sufficient to iterate until no more threads are > being created in the old group, and ignoring any threads that > die before we can move them. > > It would be nicer to start the threads in the right cgroup to > begin with, but by default, all child threads are created in > the same cgroup as their parent, and we don't want vcpu child > threads in the emulator cgroup, so I don't see any good way > of avoiding the move. It would also be nice if the kernel were > to implement something like rename() as a way to atomically move > a group of threads from one cgroup to another, instead of forcing > a window where we have to read and parse the source, then format > and write back into the destination. > > * src/util/vircgroup.c (virCgroupAddTaskStrController): Ignore > ESRCH, because a thread ended between read and write attempts. > (virCgroupMoveTask): Loop until all threads have moved. > > Signed-off-by: Eric Blake <eblake@xxxxxxxxxx> > --- > src/util/vircgroup.c | 28 ++++++++++++++++++++-------- > 1 file changed, 20 insertions(+), 8 deletions(-) > > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c > index 07ea2c3..6d44694 100644 > --- a/src/util/vircgroup.c > +++ b/src/util/vircgroup.c > @@ -1037,7 +1037,11 @@ static int virCgroupAddTaskStrController(virCgroupPtr group, > goto cleanup; > > rc = virCgroupAddTaskController(group, p, controller); > - if (rc != 0) > + /* A thread that exits between when we first read the source > + * tasks and now is not fatal. */ > + if (rc == -ESRCH) > + rc = 0; > + else if (rc != 0) > goto cleanup; > > next = strchr(cur, '\n'); > @@ -1074,15 +1078,23 @@ int virCgroupMoveTask(virCgroupPtr src_group, virCgroupPtr dest_group) > !dest_group->controllers[i].mountPoint) > continue; > > - rc = virCgroupGetValueStr(src_group, i, "tasks", &content); > - if (rc != 0) > - return rc; > + /* New threads are created in the same group as their parent; > + * but if a thread is created after we first read we aren't > + * aware that it needs to move. Therefore, we must iterate > + * until content is empty. */ > + while (1) { > + rc = virCgroupGetValueStr(src_group, i, "tasks", &content); > + if (rc != 0) > + return rc; > + if (!*content) > + break; > > - rc = virCgroupAddTaskStrController(dest_group, content, i); > - if (rc != 0) > - goto cleanup; > + rc = virCgroupAddTaskStrController(dest_group, content, i); > + if (rc != 0) > + goto cleanup; > > - VIR_FREE(content); > + VIR_FREE(content); > + } > } > > cleanup: ACK, this looks like the best we can do here. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list