On 02/14/2011 06:45 AM, Daniel P. Berrange wrote: > On Mon, Feb 14, 2011 at 12:32:18PM +0000, Daniel P. Berrange wrote: >> I'm getting periodic failures of the 'commandtest' case where the diff >> is >> >>> DAEMON:no >> -- >> < DAEMON:yes >> >> For test cases 'test3' through to 'test15', except 'test'4. Tests 1, >> 2, 4, 16, 17, 18 are all unaffected. I can never reproduce it when >> I run just that one test case manually, but the automated builds >> I'm doing hit it > 50% of the time so it is clearly some kind of race >> condition. The line of code that's getting confused is this >> >> fprintf(log, "DAEMON:%s\n", getpgrp() == getsid(0) ? "yes" : "no"); >> >> but I'm not clear how this can be going wrong. Ideas ? > > Hmm, this seems to occur when you run the build under cron. Under an > interactive shell getpgrp() != getsid() by default. Under non-interactive > shell, then getpgrp() == getsid() even if we didn't ask to daemonize. > So this approach to detecting whether we're daemonized isn't reliable. It's more reliable than the previous test, which was getppid() == 1, since that was even more racy depending on whether the intermediate process had exited yet to result in the grandchild being reparented to init. But yes, I think you correctly analyzed the problem of running from cron rather than an interactive shell. > > Perhaps the 'commandtest' program itself needs to call setsid()+setpgid() > before running the child processes ? That sounds like a reasonable solution. Do you want to tackle it, or shall I? I'm also seeing that test fail on cygwin; probably due to .exe suffixes. But I haven't yet had time to investigate how easy that would be to patch, yet. -- Eric Blake eblake@xxxxxxxxxx +1-801-349-2682 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list