On Thu, May 20, 2021 at 10:44 PM CEST, Cong Wang wrote: > On Thu, May 20, 2021 at 1:14 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: >> >> Bugs do happen though, so if you can detect some error condition >> instead of having an infinite loop, then do it. > > You both are underestimating the problem. There are two different things > to consider here: > > 1) Kernel bugs: This is known unknown, we certainly do not know > how many bugs we have, otherwise they would have been fixed > already. So we can not predict the consequence of the bug either, > assuming a bug could only cause packet drop is underestimated. > > 2) Configurations: For instance, firewall rules. If the selftests are run > in a weird firewall setup which drops all UDP packets, there is nothing > we can do in the test itself. If we have to detect this, then we would > have to detect netem cases too where packets can be held indefinitely > or reordered arbitrarily. The possibilities here are too many to detect, > hence I argue the selftests should setup its own non-hostile environment, > which has nothing to do with any specific program. > > This is why I ask you to draw a boundary: what we can assume and > what we can't. My boundary is obviously clear: we just assume the > environment is non-hostile and we can't predict any kernel bugs, > nor their consequences. > > Thanks. (Sorry for the delay in reviews. I've been out.) In my mind uAPI tests should not be tailored to the underlying implementation (non-blocking read after write over loopback succeeds for TCP), or the environment they run in (packets don't get dropped due to OOM, signals don't interrupt syscalls). If it's a non-blocking socket, then EAGAIN can happen. That's the contract between the kernel and the user-space. There is already a helper in this test case for polling and reading with a timeout (see recv_timeout()). IMO we should be using it in all tests that use non-blocking I/O. If it's not being used already, that is most likely my fault.