Hi Linus, On Thu, Sep 27, 2012 at 04:40:13PM -0700, Linus Torvalds wrote: > On Thu, Sep 27, 2012 at 2:51 PM, <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > From: Andrea Arcangeli <aarcange@xxxxxxxxxx> > > Subject: thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy > > > > Some time ago Petr once reproduced a false positive VM_BUG_ON in > > khugepaged while running the autonuma-benchmark on a large 8 node system. > > All production kernels out there have DEBUG_VM=n so it was only noticeable > > on self built kernels. It's not easily reproducible even on the 8 nodes > > system. > > > > Use page_freeze_refs to prevent speculative pagecache lookups to > > trigger the false positives, so we're still able to check the > > page_count to be exact. > > This is too ugly to live. It also fundamentally changes semantics and > actually makes CONFIG_DEBUG_VM result in totally different behavior. > > I really don't think it's a good feature to make CONFIG_DEBUG_VM > actually seriously change serialization. > > Either do the page_freeze_refs thing *unconditionally*, presumably > replacing the current code that does > > ... > /* cannot use mapcount: can't collapse if there's a gup pin */ > if (page_count(page) != 1) { > > instead, or then just relax the potentially racy VM_BUG_ON() to just > check >= 2. Because debug stuff that changes semantics really is > horribly horribly bad. Agreed. I think we can simply drop that VM_BUG_ON: there's a putback_lru_page that does a put_page, and then another that does page_cache_release just after the VM_BUG_ON, so if the count is < 2 we'll notice when the count underflows in free_page_and_swap_cache (we've a VM_BUG_ON in put_page_testzero for that). The reason for too ugly to live patch, is that I wanted to take the opportunity of a workload on a 128 CPU 8 node system that could reproduce it to verify that it really was the speculative lookup and not something else (and something else wouldn't have been good :). But I fully agree with you that after successfully verifying that, it's good to go with a non-ugly version. Thanks for checking this! I'll send an updated patch. > Btw, there are two other thp patches (relating to mlock) floating > around. They look much more reasonable than this one, but I was hoping > to see more ack's for them. Ok! I'll review them too. I read them and they looked all right but I didn't spend enough time on it yet to be sure and send an ack sorry. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>