On Mon, 9 Dec 2019 at 18:48, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > [ Added DJ to the participants, since he seems to be the Fedora make > maintainer - DJ, any chance that this absolutely horrid 'make' buf can > be fixed in older versions too, not just rawhide? The bugfix is two > and a half years old by now, and the bug looks real and very serious ] > > On Mon, Dec 9, 2019 at 1:54 AM Vincent Guittot > <vincent.guittot@xxxxxxxxxx> wrote: > > > > Which version of make should I use to reproduce the problem ? > > So the problematic one is "make-4.2.1-13.fc30.x86_64" in Fedora 30. > I'm assuming it's fairly plain 4.2.1, but I didn't try to look into > the source rpm or anything like that. I'm using Debian buster and the make package is version: 4.2.1-1.2 for arm64. It doesn't have the commit you mentioned below but I don't see the problem on my platform and all 8 cpus are used with -j 16 or even -j 9 > > The working one for me was just the top of -git from > > https://git.savannah.gnu.org/git/make.git > > which is 4.2.92 right now. > > The fix is presumably commit b552b05 ("[SV 51159] Use a non-blocking > read with pselect to avoid hangs") as per Akemi. That is indeed after > 4.2.1, and it looks real. > > Before that commit the buggy jobserver code basically does > > (1) use pselect() to wait for readable and see child deaths atomically > (2) use blocking read to get the token > > and while (1) is atomic, if the child death happens between the two, > it goes into the blocking read and has SIGCHLD blocked, so it will try > to read the token from the token pipe, but it will never react to the > child death - and the child death is what is going to _release_ a > token. > > So what seems to happen is that when the right timing triggers, you That can explain why I can't see the problem on my platform > end up with a lot of sub-makes waiting for a token, but they are also > all supposed to _release_ a token. So you don't have enough tokens to > go around. In the worst case, _everybody_ who has a token is also not > releasing it, and then you end up triggering the timeout code (after > one second), which will make things really go into a crawl. > > And by a crawl I mean that worst-case you really end up with just one > job per second per sub-make. It will take _hours_ to compile the > kernel at that speed, when it normally finishes in 15 minutes on my > machine even when I do a from-scratch allmodconfig build. > > It does seem to be a major bug in the jobserver code. In particular > with the trial fair and exclusive wakeup patch that I sent out in the > other thread, it seems to be _reliably_ much worse and triggers 100% > of the time for me. > > It's possible that my trial patch is buggy, but everything else looks > fine, and with a fixed make the trial patch works for me. > > I'll include the trial patch here too, I think I cc'd you on the other > thread too, but hey.. > > Anyway, it looks like the sync wakeup thing is more of a "get timing > right by luck" thing than anything else. Possibly it actually causes > the reverse order of reader wakeups more often (ie the most _recent_ > reader is most likely to get woken up synchronously) and that may be > what really ends up masking the jobserver problem, since apparently > doing wakeups in the fair and proper order makes things much worse.. > > What a horrible pain that pipe rework ended up being. But I think > we're in better shape now than we used to be, it just had very > unfortunate timing issues and several real bugs. > > But sadly, there's no way I can push that fair pipe wakeup thing as > long as this horribly buggy version of make is widespread. > > Linus