Daniel P. Berrange wrote: > On Mon, Jul 03, 2017 at 05:20:13PM +0400, Roman Bogorodskiy wrote: > > Ján Tomko wrote: > > > > > [cc: Guido] > > > > > > On Sat, Jul 01, 2017 at 02:18:58PM +0400, Roman Bogorodskiy wrote: > > > > Andrea Bolognani wrote: > > > >> virnetsockettest also fails pretty often for me, certainly > > > >> more than your figure; even if that wasn't the case, 1/5 > > > >> failure rate is way too high for a CI job. > > > > > > > >I played a little more with virnetsockettest to get real stats and > > > >figured the following: > > > > > > > > 1. On my desktop (i5) and laptop (i3), I didn't get any failures in 50 > > > > 'check' runs > > > > 2. On a VM that I use to run test builds in Jenkins, out of 50 runs it > > > > fails from 1 to 6 times; I did this test a couple of times and either I > > > > was lucky or failure rate is higher when my Jenkins perform regular > > > > builds. > > > > > > > >Anyway, I'll try to find a way to debug what's going on with > > > >virnetsockettest. > > > > > > > > > > IIRC Debian disabled this test years ago. > > > > > > Guido, have you ever discovered the cause? > > > > > > Jan > > > > I made some experiments on the weekend, and here are my results: > > > > On a box where test fails from time to time, it fails at this point: > > > > virObjectUnref(csock); > > > > for (i = 0; i < nlsock; i++) { > > if (virNetSocketAccept(lsock[i], &ssock) != -1 && ssock) { > > char c = 'a'; > > if (virNetSocketWrite(ssock, &c, 1) != -1 && > > virNetSocketRead(ssock, &c, 1) != -1) { > > VIR_DEBUG("Unexpected client socket present"); <--- HERE > > goto cleanup; > > } > > } > > virObjectUnref(ssock); > > ssock = NULL; > > } > > > > On a box where this test never fails, it reaches this block, but: > > > > * virNetSocketWrite(ssock, &c, 1) != -1 > > * virNetSocketRead(ssock, &c, 1) == -1 > > > > It's enough to make the test pass. On a failing box both Write() and Read() > > return != -1 when the test fails. > > We discussed this on IRC and what is happening here is a race condition. > > The test suite is connecting a client to the server and then immediately > closing the client connection. When the server tries to accept the > client, usually it'll get -1 because the client has already gone away. > Socket termination is a multi-stage process at the network layer, and > so there is a non-zero chance that Accept will succeed. The test suite > is assuming that if the accept did succeeed, then we'll get I/O error > on read & write, but this is also not actually guaranteed. It is > possible that write may suceed buffering output, and read may simply > see '0' for EOF. When this happens the test suite will fail. Roman > debugging a failing run & confirmed this is exactly what's happening. > > IOW, this test suite is just plain broken. It needs rewriting to do > a more sensible real world test. ie spawn a thread to act as the > server, and have the server just read from the client & echo it > back to the client. The main thread acts as the client and tests > this echo'ing. This is race free and is an real-world example of > usage. Yeah, I'll try to re-write this test, this weekend hopefully. Roman Bogorodskiy
Attachment:
signature.asc
Description: PGP signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list