On Mon, Jan 11, 2021 at 9:55 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > I think it's too early to push this one through to stable: > Linus mentioned on Friday that Michael Larabel of Phoronix > has observed a performance regression from this commit. That turned out to be a red herring. Yes, Michael saw a performance regression on some machines, but the change to go back to the old model (where the repeat was basically at wakeup time rather than in the waiter) didn't actually make any difference. And the issue only showed on a couple of machines, and only with certain configurations (ie apparently switching to the performance governor made it go away). So it seems to have been some modal behavior about random timing (possibly just interaction with cpufreq) rather than a real regression. I think the real issue is simply that some loads are very sensitive to the exact writeback timing patterns. And I think we're making things worse by having some of the patterns be very non-deterministic indeed. For example, long before we actually take the page lock and then wait for (and set) the page writeback bit, look at how we first use the Xarray tags to turn the "page dirty" tag into "page needs writeback" tag, and then look up an array of such writeback pages: all without any real locking at all (apart from the xas lock itself for the tagging op). Making things even less deterministic, the code that doesn't do writeback - but just _wait_ for writeback - doesn't actually serialize with the page lock at all. It _just_ does that "wait_for_page_writeback()", which is ambiguous when there are consecutive writebacks happening. That's actually the case that I think Michael would have seen - because he obviously never saw the (very very rare) BUG_ON. The BUG_ON() in page writeback itself is serialized by the page lock and so there aren't really many possibilities for that to get contention or other odd behavior (the wakeup race being the one very very unlikely notable one). In contrast, the "wait for writeback" isn't serialized by anything else, so that one is literally "if was at writeback at some point, wait for it to no longer be", and then the aggressive wakeup was good for that case, while it caused problems for the writeback case. Anyway, the numbers are all ambiguous, the one-liner fix is not horrible, and the take-away from all of this is likely mostly: it would be good to have some more clarity about the whole writeback and wait-for-writeback thing. In many ways it would be really line to have a sequence count rather than just a single bit. But obviously that does not work for 'struct page'. Anyway, don't hold up this "get rid of BUG_ON() in writeback" patch for this. Linus