Updated subject line, and here's the link to the original discussion for new people: https://lore.kernel.org/all/B88D3073-440A-41C7-95F4-895D3F657EF2@xxxxxxxxx/ On Mon, Oct 31, 2022 at 10:28 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Ok. At that point we no longer have the pte or the virtual address, so > it's not going to be exactly the same debug output. > > But I think it ends up being fairly natural to do > > VM_WARN_ON_ONCE_PAGE(page_mapcount(page) < 0, page); > > instead, and I've fixed that last patch up to do that. Ok, so I've got a fixed set of patches based on the feedback from PeterZ, and also tried to do the s390 updates for this blindly, and pushed them out into a git branch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?h=mmu_gather-race-fix If people really want to see the patches in email again, I can do that, but most of you already have, and the changes are either trivial fixes or the s390 updates. For the s390 people that I've now added to the participant list maybe the git tree is fine - and the fundamental explanation of the problem is in that top-most commit (with the three preceding commits being prep-work). Or that link to the thread about this all. That top-most commit is also where I tried to fix things up for s390 that uses its own non-gathering TLB flush due to CONFIG_MMU_GATHER_NO_GATHER. NOTE NOTE NOTE! Unlike my regular git branch, this one may end up rebased etc for further comments and fixes. So don't consider that stable, it's still more of an RFC branch. At a minimum I'll update it with Ack's etc, assuming I get those, and my s390 changes are entirely untested and probably won't work. As far as I can tell, s390 doesn't actually *have* the problem that causes this change, because of its synchronous TLB flush, but it obviously needs to deal with the change of rmap zapping logic. Also added a few people who are explicitly listed as being mmu_gather maintainers. Maybe people saw the discussion on the linux-mm list, but let's make it explicit. Do people have any objections to this approach, or other suggestions? I do *not* consider this critical, so it's a "queue for 6.2" issue for me. It probably makes most sense to queue in the -MM tree (after the thing is acked and people agree), but I can keep that branch alive too and just deal with it all myself as well. Anybody? Linus