On 12/17/2014 08:06 AM, Martin Kletzander wrote: > On Wed, Dec 17, 2014 at 12:00:36AM -0700, Eric Blake wrote: >> On 12/16/2014 11:51 PM, Eric Blake wrote: >>> On 12/15/2014 12:58 AM, Martin Kletzander wrote: >>>> Instead of setting the value of cpuset.mems once when the domain starts >>>> and then re-calculating the value every time we need to change the >>>> child >>>> cgroup values, leave the cgroup alone and rather set the child data >>>> every time there is new cgroup created. We don't leave any task in the >>>> parent group anyway. This will ease both current and future code. >>>> >>>> Signed-off-by: Martin Kletzander <mkletzan@xxxxxxxxxx> >>>> --- >>>> src/qemu/qemu_cgroup.c | 67 >>>> ++++++++++++++++++++++++++++++++++++++++++++++++-- >>>> src/qemu/qemu_driver.c | 59 >>>> +++++++++++++++----------------------------- >>>> 2 files changed, 85 insertions(+), 41 deletions(-) >>> >>> This patch causes libvirtd to segfault on startup: >> >> More particularly, I'm doing an upgrade from older libvirtd to this >> version, while leaving a transient domain running that was started by >> the older libvirtd. Hope that helps you narrow in on the problem. Weird - At the time I made the report, I ran 'git bisect' and reliably reproduced the crash across multiple libvirtd restarts (a restart for each new build while trying to nail down the culprit commit), as long as I left that transient domain running. >> > > I tried that and it works for me. And I tried various domains, both > heavily cgroup dependent and simple ones. But now that I've rebooted, and then proceeded to do incremental builds from both before and after the patch, I can't reproduce the crash. Although my formula for creating my transient domain was the same both yesterday and today, there may be some difference in the version of libvirtd that was running at the time I created the domain that then affected the XML affecting the libvirtd restarts. > >> Reverting 86759e and af2a1f0 was sufficient to get me going again >> locally, but I'm not going to push my reversions until you've first had >> a chance to address the regression. >> >>> #2 0x00007ffff7405673 in virCgroupHasEmptyTasks (cgroup=0x0, >>> controller=2) >>> at util/vircgroup.c:3935 > > From this line it looks like priv->cgroup was not initialized. I did > not add a check for that, so that may be the cause. I'll send a patch > soon. > > But I wonder how did you manage to do that, is that a session libvirtd > you restarted? Otherwise how come virCgroupNewDetectMachine() didn't > fill the cgroup? I'm not spotting anything obvious why it wasn't initialized at the time I reproduced the crash, nor why I can't reproduce it now. Hopefully it's not a lurking time bomb. If it happens again, I'll definitely report it; but for now, without a reliable reproduction, it could easily have been something caused on my end (since it is my dev machine, it may have been caused by a half-baked patch on my end that I was testing at the time I created the transient domain, where using only upstream patches wouldn't see the issue). So for now, don't worry too hard if you can't find it either. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list