Re: [patch 1/1] thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Linus,

On Thu, Sep 27, 2012 at 04:40:13PM -0700, Linus Torvalds wrote:
> On Thu, Sep 27, 2012 at 2:51 PM,  <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > From: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> > Subject: thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy
> >
> > Some time ago Petr once reproduced a false positive VM_BUG_ON in
> > khugepaged while running the autonuma-benchmark on a large 8 node system.
> > All production kernels out there have DEBUG_VM=n so it was only noticeable
> > on self built kernels.  It's not easily reproducible even on the 8 nodes
> > system.
> >
> > Use page_freeze_refs to prevent speculative pagecache lookups to
> > trigger the false positives, so we're still able to check the
> > page_count to be exact.
> 
> This is too ugly to live. It also fundamentally changes semantics and
> actually makes CONFIG_DEBUG_VM result in totally different behavior.
> 
> I really don't think it's a good feature to make CONFIG_DEBUG_VM
> actually seriously change serialization.
> 
> Either do the page_freeze_refs thing *unconditionally*, presumably
> replacing the current code that does
> 
>                 ...
>                 /* cannot use mapcount: can't collapse if there's a gup pin */
>                 if (page_count(page) != 1) {
> 
> instead, or then just relax the potentially racy VM_BUG_ON() to just
> check >= 2. Because debug stuff that changes semantics really is
> horribly horribly bad.

Agreed. I think we can simply drop that VM_BUG_ON: there's a
putback_lru_page that does a put_page, and then another that does
page_cache_release just after the VM_BUG_ON, so if the count is < 2
we'll notice when the count underflows in free_page_and_swap_cache
(we've a VM_BUG_ON in put_page_testzero for that).

The reason for too ugly to live patch, is that I wanted to take the
opportunity of a workload on a 128 CPU 8 node system that could
reproduce it to verify that it really was the speculative lookup and
not something else (and something else wouldn't have been good :). But
I fully agree with you that after successfully verifying that, it's
good to go with a non-ugly version. Thanks for checking this!

I'll send an updated patch.

> Btw, there are two other thp patches (relating to mlock) floating
> around. They look much more reasonable than this one, but I was hoping
> to see more ack's for them.

Ok! I'll review them too. I read them and they looked all right but I
didn't spend enough time on it yet to be sure and send an ack sorry.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]