On 03/06/2018 10:58 AM, Daniel P. Berrangé wrote:
Currently both virtlogd and virtlockd use a single worker thread for dispatching RPC messages. Even this is overkill and their RPC message handling callbacks all run in short, finite time and so blocking the main loop is not an issue like you'd see in libvirtd with long running QEMU commands. By setting max_workers==0, we can turn off the worker thread and run these daemons single threaded. This in turn fixes a serious problem in the virtlockd daemon whereby it looses all fcntl() locks at re-exec due to multiple threads existing. fcntl() locks only get preserved if the process is single threaded at time of exec().
I suppose this change has no affect when e.g. starting many domains in parallel when locking is enabled. Before the change, there's still only one worker thread to process requests.
I've tested the series and locks are now preserved across re-execs of virtlockd. Question is whether we want this change or pursue fixing the underlying kernel bug?
FYI, via the non-public bug I asked a glibc maintainer about the lost lock behavior. He agreed it is a kernel bug and posted the below comment to the bug.
Regards, Jim First, I agree that POSIX file record locks (i.e. the fcntl F_SETLK ones, which you're using) _are_ to be preserved over execve (absent any FD_CLOEXEC of course, which you aren't using). (Relevant quote from fcntl(2): Record locks are not inherited by a child created via fork(2), but are preserved across an execve(2). Second I agree that the existence or non-existence of threads must not play a role in the above. Third I see this from strace of your reproducer (only relevant snippets, with a bit of explanation): 13:54:09.581429 clone(child_stack=0x7f74b0517ff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f74b05189d0, tls=0x7f74b0518700, child_tidptr=0x7f74b05189d0) = 30911 Process 30911 attached So, here we created the thread 30911. PID 30910 is now taking the lock: [pid 30910] 13:54:09.581562 open("lock.txt", O_WRONLY|O_CREAT|O_TRUNC, 0755 <unfinished ...> [pid 30911] 13:54:09.581595 set_robust_list(0x7f74b05189e0, 24 <unfinished ...> [pid 30910] 13:54:09.581647 <... open resumed> ) = 3 [pid 30911] 13:54:09.581671 <... set_robust_list resumed> ) = 0 [pid 30910] 13:54:09.581693 fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=42} <unfinished ...> [pid 30911] 13:54:09.581722 rt_sigprocmask(SIG_BLOCK, [CHLD], <unfinished ...> [pid 30910] 13:54:09.581750 <... fcntl resumed> ) = 0 Lock aquired. Now we do a little debug output and sleeping, and the only actions on FD 3 during this are: [pid 30910] 13:54:09.581790 fcntl(3, F_GETFD <unfinished ...> [pid 30910] 13:54:09.581872 <... fcntl resumed> ) = 0 [pid 30910] 13:54:09.581911 fcntl(3, F_SETFD, 0 <unfinished ...> [pid 30910] 13:54:09.581958 <... fcntl resumed> ) = 0 Then comes the execve from 30910, with no intervening thread exit or the like. But of course during the execve the thread 30911 is killed: [pid 30910] 13:54:19.582600 execve("/suse/matz/lock-strangeness", ["/suse/matz/lock-strangeness", "3"], [/* 0 vars */] <unfinished ...> [pid 30911] 13:54:19.583422 +++ exited with 0 +++ 13:54:19.583749 <... execve resumed> ) = 0 13:54:19.583845 brk(0) = 0xc0c000 So, 30910 is alone again and is executing, loading the shared libs, relocating and so on. The first action done to FD 3 (retained over the execve) is: 13:54:19.588016 fcntl(3, F_GETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=42, pid=0}) = 0 13:54:19.588101 fcntl(3, F_GETFD) = 0 13:54:19.588480 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 Bleah! F_UNLCK we get, which isn't what we're supposed to get. As there are no intervening syscalls that could in any way have removed the lock (in fact there are none other to file descriptor 3) it can't be glibc or libpthread (or anything else userspace is doing) that would have caused the lock to be removed. It's the kernel itself, and hence a bug therein. If I may hazard a guess it would be that the forced exits of all non-main threads done by the kernel somehow lead to cleaning up the locks, possibly because the actions that need doing during fork(2) (which also includes other-threads-exit, but also lock-removal) are conflated with the actions that need happen during execve(2) (which must _not_ include lock-removel). -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list