On Fri, Dec 6, 2019 at 7:50 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > The "make goes slow" problem bisects down to b667b8673443 ("pipe: > Advance tail pointer inside of wait spinlock in pipe_read()"). I'm not entirely sure that ends up being 100% true. It did bisect to that, but the behavior wasn't entirely stable. There definitely is some nasty timing trigger. But I did finally figure out what seems to have been going on with at least the biggest part of the build performance regression. It's seems to be a nasty interaction with the scheduler and the GNU make jobserver, and in particular the pipe wakeups really _really_ do seem to want to be synchronous both for the readers and the writers. When a writer wakes up a reader, we want the reader to react quickly and vice versa. The most obvious case was for the GNU make jobserver, where sub-makes would do a single-byte write to the jobserver pipe, and we want to wake up the reader *immediatly*, because the reader is actually a lot more important than the writer. The reader is what gets the next job going, the writer just got done with the last one. And when a reader empties a full pipe, it's because the writer is generating data, and you want to just get the writer going again asap. Anyway, I've spent way too much time looking at this and wondering about odd performance patterns. It seems to be mostly back up to normal. I say "mostly", because I still see times of "not as many concurrent compiles going as I'd expect". It might be a kbuild problem, it might be an issue with GNU make (I've seen problems with the make jobserver wanting many more tokens than expected before and the kernel makefiles - it migth be about deep subdirectories etc), and it might be some remaining pipe issue. But my allmodconfig builds aren't _enormously_ slower than they used to be. But there's definitely some unhappy interaction with the jobserver. I have 16 threads (8 cores with HT), and I generally use "make -j32" to keep them busy because the jobserver isn't great. The pipe rework made even that 2x slop not work all that well. Something held on to tokens too long, and there was definitely some interaction with the pipe wakeup code. Using "-j64" hid the problem, but it was a problem. It might be the new scheduler balancing changes that are interacting with the pipe thing. I'm adding PeterZ, Ingo and Vincent to the cc, because I hadn't realized just how important the sync wakeup seems to be for pipe performance even at a big level. I've pushed out my pipe changes. I really didn't want to do that kind of stuff at the end of the merge window, but I spent a lot more time than I wanted looking at this code, because I was getting to the point where the alternative was to just revert it all. DavidH, give these a look: 85190d15f4ea pipe: don't use 'pipe_wait() for basic pipe IO a28c8b9db8a1 pipe: remove 'waiting_writers' merging logic f467a6a66419 pipe: fix and clarify pipe read wakeup logic 1b6b26ae7053 pipe: fix and clarify pipe write wakeup logic ad910e36da4c pipe: fix poll/select race introduced by the pipe rework the top two of which are purely "I'm fed up looking at this code, this needs to go" kind of changes. In particular, that last change is because I think the GNU jobserver problem is partly a thundering herd issue: when a job token becomes free (ie somebody does a one-byte write to an empty jobserver pipe), it wakes up *everybody* who is waiting for a token. One of them will get it, and the others will go to sleep again. And then it repeats all over. I didn't fix it, but it _could_ be fixed with exclusive waits for readers/writers, but that means more smarts than pipe_wait() can do. And because the jobserver isn't great at keeping everybody happy, I'm using a much bigger "make -jX" value than the number of CPU's I have, which makes the herd bigger. And I suspect none of this helps the scheduler pick the _right_ process to run, which just makes scheduling an even bigger problem. Linus