On Wed, Mar 07, 2018 at 10:10:29AM +0000, Daniel P. Berrangé wrote: > On Tue, Mar 06, 2018 at 04:46:05PM -0700, Jim Fehlig wrote: > > On 03/06/2018 10:58 AM, Daniel P. Berrangé wrote: > > > Currently both virtlogd and virtlockd use a single worker thread for > > > dispatching RPC messages. Even this is overkill and their RPC message > > > handling callbacks all run in short, finite time and so blocking the > > > main loop is not an issue like you'd see in libvirtd with long running > > > QEMU commands. > > > > > > By setting max_workers==0, we can turn off the worker thread and run > > > these daemons single threaded. This in turn fixes a serious problem in > > > the virtlockd daemon whereby it looses all fcntl() locks at re-exec due > > > to multiple threads existing. fcntl() locks only get preserved if the > > > process is single threaded at time of exec(). > > > > I suppose this change has no affect when e.g. starting many domains in > > parallel when locking is enabled. Before the change, there's still only one > > worker thread to process requests. > > > > I've tested the series and locks are now preserved across re-execs of > > virtlockd. Question is whether we want this change or pursue fixing the > > underlying kernel bug? > > > > FYI, via the non-public bug I asked a glibc maintainer about the lost lock > > behavior. He agreed it is a kernel bug and posted the below comment to the > > bug. > > > > Regards, > > Jim > > > > First, I agree that POSIX file record locks (i.e. the fcntl F_SETLK ones, which > > you're using) _are_ to be preserved over execve (absent any FD_CLOEXEC of > > course, which you aren't using). (Relevant quote from fcntl(2): > > > > Record locks are not inherited by a child created via fork(2), > > but are preserved across an execve(2). > > > > Second I agree that the existence or non-existence of threads must not play > > a role in the above. > > I've asked some Red Hat experts too and they suggest it looks like a kernel > bug. The question is whether this is a recent kernel regression, that is easily > fixed, or whether its a long term problem. > > I've at least verified that this broken behaviour existed in RHEL-7 (but its > possible it was backported when OFD locks were implemented). I still want to > test RHEL-6 and RHEL-5 to see if this problem goes back indefinitely. I've checked RHEL6 & RHEL5 and both are affected, so this a long time Linux problem, and so we'll need to workaround it. FYI I've got kernel bug open here to track it from RHEL side: https://bugzilla.redhat.com/show_bug.cgi?id=1552621 Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list