[Adding libvirt list...] Ian Jackson wrote: > Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"): > >> BTW, I only see the crash when the save/restore script is running. I >> stopped the other scripts and domains, running only save/restore on a >> single domain, and see the crash rather quickly (within 10 iterations). >> > > I'll look at the libvirt code, but: > > With a recurring timeout, how can you ever know it's cancelled ? > There might be threads out there, which don't hold any locks, which > are in the process of executing a callback for a timeout. That might > be arbitrarily delayed from the pov of the rest of the program. > > E.g.: > > Thread A Thread B > > invoke some libxl operation > X do some libxl stuff > X register timeout (libxl) > XV record timeout info > X do some more libxl stuff > ... > X do some more libxl stuff > X deregister timeout (libxl internal) > X converted to request immediate timeout > XV record new timeout info > X release libvirt event loop lock > entering libvirt event loop > V observe timeout is immediate > V need to do callback > call libxl driver > > entering libvirt event loop > V observe timeout is immediate > V need to do callback > call libxl driver > call libxl > X libxl sees timeout is live > X libxl does libxl stuff > libxl driver deregisters > V record lack of timeout > free driver's timeout struct > call libxl > X libxl sees timeout is dead > X libxl does nothing > libxl driver deregisters > V CRASH due to deregistering > V already-deregistered timeout > > If this is how things are, then I think there is no sane way to use > libvirt's timeouts (!) > Looking at the libvirt code again, it seems a single thread services the event loop. See virNetServerRun() in src/util/virnetserver.c. Indeed, I see the same thread ID in all the timer and fd callbacks. One of the libvirt core devs can correct me if I'm wrong. Regards, Jim > In principle I guess the driver could keep its per-timeout structs > around forever and remember whether they've been deregistered or not. > Each one would have to have a lock in it. > > But if you think about it, if you have 10 threads all running the > event loop and you set a timeout to zero, doesn't that mean that every > thread's event loop should do the timeout callback as fast as it can ? > That could be a lot of wasted effort. > > The best solution would appear to be to provide a non-recurring > callback. > > >> I'm not so thrilled with the timeout handling code in the libvirt libxl >> driver. The driver maintains a list of active timeouts because IIRC, >> there were cases when the driver received timeout deregistrations when >> calling libxl_ctx_free, at which point some of the associated structures >> were freed. The idea was to call libxl_osevent_occurred_timeout on any >> active timeouts before freeing libxlDomainObjPrivate and its contents. >> > > libxl does deregister fd callbacks in libxl_ctx_free. > > But libxl doesn't currently "deregister" any timeouts in > libxl_ctx_free; indeed it would be a bit daft for it to do so as at > libxl_ctx_free there are no aos running so there would be nothing to > time out. > > But there is a difficulty with timeouts which libxl has set to occur > immediately but which have not yet actually had the callback. The the > application cannot call libxl_ctx_free with such timeouts outstanding, > because that would imply later calling back into libxl with a stale > ctx. > > (Looking at the code I see that the "nexi" are never actually freed. > Bah.) > > Thanks, > Ian. > > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list