On Tue, Dec 10, 2019 at 6:38 AM Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote: > > On Mon, 9 Dec 2019 at 18:48, Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > Before that commit the buggy jobserver code basically does > > > > (1) use pselect() to wait for readable and see child deaths atomically > > (2) use blocking read to get the token > > > > and while (1) is atomic, if the child death happens between the two, > > it goes into the blocking read and has SIGCHLD blocked, so it will try > > to read the token from the token pipe, but it will never react to the > > child death - and the child death is what is going to _release_ a > > token. > > > > So what seems to happen is that when the right timing triggers, you > > That can explain why I can't see the problem on my platform Note that the above is kind of simplified. It actually needs a bit more to trigger.. To lose _one_ token, you need to have a sub-make basically hit this race: - the pselect() needs to say that the pipe is readable, so there is at least one token - another sub-make comes along and steals the very last token - but because pselect returned "readable" (no SIGCHLD yet), the read() starts and now blocks because all the jobserver tokens are gone again due to the other sub-make stealing the last one. - before a new token comes in, the child exits, and now because the sub-make is blocking for reads (and because the jobserver blocks SIGCHILD in general outside of the pselect), it doesn't react, so it won't release the token that the child holds. but notice how any _other_ sub-make then releasing a token will get things going again, so the _common_situation is that the jobserver bug only causes a slight dip in concurrency. Hitting it _once_ is trivial. Losing several tokens at once is also not that hard: you don't need to hit the race many times, it's enough to hit the exact same race just once - just with several sub-makes at the same time. And that "lose several tokens at once" isn't as unlikely as you'd think: they are all doing the same thing, and they all saw the free token with "pselect()", they all did a "read()". And since it's common for the tokens to be gone, the common case is that _one_ of the waiting sub-makes got the read, and the N other sub-makes did not, and went into the blocking read(). And they all had children that were about to finish, and finished before the next token got available. So losing quite a few tokens is easy. This has actually gone on for a long time, and I just never debugged it. My solution has been "I have 16 threads (8 core with HT), but I'll use -j32, and it is all good". I bet you can see it too - the buggy jobserver just means that the load isn't as high as you'd expect. Just run 'top' while the make is going. With the _fixed_ jobserver, if I do "make -j32", I will actually see a load that is over 32 (I do nothing but kernel compiles and occasional reboots during the merge window, so I have the kernel in my cache, so there's no IO, I have a fast SSD so writeback doesn't cause any delays either etc etc, and I have my browser and a few other things going). With the buggy one, even before the pipe rework, I would see a load that tended to fluctuate around 16. Because due to the bug you have a few locked-up tokens at times, so instead of getting a load of 32 when you use "make -j32", you get a load of maybe 20. Or maybe less. The pipe re-work made it much easier to trigger the "almost all the tokens are gone" for some reason. And the "fair readers()" patch I have seems to make it _really_ easy to trigger the case where absolutely all the tokens were gone and it goes into a single-thread mode. I'm not sure I really ever saw the 1-second timeout trigger, but it was slow. But it _is_ dependent on timing, so somebody else with a different load - or a different machine - might not see it to nearly the same degree. I bet you see the load value difference, though, even if you don't necessarily see enough idle CPU time to see much of a difference in compile times. After all, once you have all CPU's busy, it doesn't matter if you have a load of 16 or a load of 32 - the higher load won't make the compile go any faster. Linus