On Tue, 5 Jan 2021, Qian Cai wrote: > On Tue, 2021-01-05 at 13:35 -0800, Hugh Dickins wrote: > > This patchset went into mmotm 2020-11-16-16-23, so probably linux-next > > on 2020-11-17: you'll have had three trouble-free weeks testing with it > > in, so it's not a likely suspect. I haven't looked yet at your report, > > to think of a more likely suspect: will do. > > Probably my memory was bad then. Unfortunately, I had 2 weeks holidays before > the Thanksgiving as well. I have tried a few times so far and only been able to > reproduce once. Looks nasty... I have not found a likely suspect. What it smells like is a defect in cloning anon_vma during fork, such that mappings of the THP can get added even after all that could be found were unmapped (tree lookup ordering should prevent that). But I've not seen any recent change there. It would be very easily fixed by deleting the whole BUG() block, which is only there as a sanity check for developers: but we would not want to delete it without understanding why it has gone wrong (and would also have to reconsider two related VM_BUG_ON_PAGEs). It is possible that b6769834aac1 ("mm/thp: narrow lru locking") of this patchset has changed the timing and made a pre-existing bug more likely in some situations: it used to hold an lru_lock before that BUG() on total_mapcount(), and now does not; but that's not a lock which should be relevant to the check. When you get more info (or not), please repost the bugstack in a new email thread: this thread is not really useful for pursuing it. Hugh