Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Sat, Apr 25, 2015 at 12:42:25AM +0300, Kirill A. Shutemov wrote:
> On Thu, Apr 23, 2015 at 06:23:11PM +0200, Andrea Arcangeli wrote:
> > On Wed, Apr 22, 2015 at 12:26:55PM -0700, Linus Torvalds wrote:
> > > On Wed, Apr 22, 2015 at 11:33 AM, Kirill A. Shutemov
> > > <kirill@xxxxxxxxxxxxx> wrote:
> > > >
> > > > Could you try patch below instead? This can give a clue what's going on.
> > > 
> > > Just FYI, I've done the revert in my tree.
> > > 
> > > Trying to figure out what is going on despite that is obviously a good
> > > idea, but I'm hoping that my merge window is winding down, so I am
> > > trying to make sure it's all "good to go"..
> > 
> > Sounds safer to defer it, agreed.
> > 
> > Unfortunately I also can only reproduce it only on a workstation where
> > it wasn't very handy to debug it as it'd disrupt my workflow and it
> > isn't equipped with reliable logging either (and the KMS mode didn't
> > switch to console to show me the oops either). It just got it logged
> > once in syslog before freezing.
> > 
> > The problem has to be that there's some get_page/put_page activity
> > before and after a PageAnon transition and it looks like a tail page
> > got mapped by hand in userland by some driver using 4k ptes which
> > isn't normal
> 
> Compound pages mapped with PTEs predates THP. See f3d48f0373c1.

Yes, I intended "normal" as a feeling about it considering it's your
new patchset that tries to introduce that behavior for regular anon
pages, I didn't imply it was not ok for driver-owned pages, sorry for
the confusion.

> I looked into code a bit more. And the VM_BUG_ON_PAGE() is bogus. See
> explanation in commit message below.
> 
> Tail page refcounting is mess. Please consider reviewing my patchset which
> drops it [1]. ;)
> 
> Linus, how should we proceed with reverted patch? Should I re-submit it to
> Andrew? Or you'll re-revert it?

You could resubmit the old patch together with this patch, so they go
together.

In retrospect it may have been cleaner to pick another field than
mapcount for the tail page refcounting, then the VM_BUG_ON could have
been retained.

mapcount was picked as candidate of tail page refcounting, because it
already implemented a "count" too so it was simpler to use than
another random 32bit word and didn't require further unions into the
page struct.

page_count cannot be used for refcounting tail pages of THP or
speculative pagecache lookups race against
split_huge_page_refcount. For non-THP it doesn't matter, even
page_count could have been used or even better no tail refcounting at
all like with your patch.

With your patch you're basically disabling the tail page refcounting
for those usages so it probably doesn't matter anymore to move away
from mapcount and removing the VM_BUG_ON doesn't concern me by now (it
never actually triggered before).

Even for those usages doubling up the refcounting in mapcount, the
refcounting of 4.0 was safe. The false positive VM_BUG_ON could only
happen only after the change.

> [1] lkml.kernel.org/g/1429823043-157133-1-git-send-email-kirill.shutemov@xxxxxxxxxxxxxxx
> 
> From 854cdc961b7f83f04a83144ab4f7459ae46b0f3d Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
> Date: Fri, 24 Apr 2015 23:49:04 +0300
> Subject: [PATCH] mm: drop bogus VM_BUG_ON_PAGE assert in put_page() codepath
> 
> My patch 8d63d99a5dfb which was merged during 4.1 merge window caused
> regression:
> 
>   page:ffffea0010a15040 count:0 mapcount:1 mapping:          (null) index:0x0
>   flags: 0x8000000000008014(referenced|dirty|tail)
>   page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
>   ------------[ cut here ]------------
>   kernel BUG at mm/swap.c:134!
> 
> The problem can be reproduced by playing *two* audio files at the same
> time and then stopping one of players. I used two mplayers to trigger
> this.
> 
> The VM_BUG_ON_PAGE() which triggers the bug is bogus:
> 
> Sound subsystem uses compound pages for its buffers, but unlike most
> __GFP_COMP users sound maps compound pages to userspace with PTEs.
> 
> In our case with two players map the buffer twice and therefore elevates

I didn't think at the case of mapping the same compound page twice in
userland, this clearly explains the crash. I thought it had to be the
PageAnon flipping somehow but I couldn't explain where... and in fact
it was something else.

> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Reported-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Reported-by: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
>  mm/swap.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index a7251a8ed532..a3a0a2f1f7c3 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -131,7 +131,6 @@ void put_unrefcounted_compound_page(struct page *page_head, struct page *page)
>  		 * here, see the comment above this function.
>  		 */
>  		VM_BUG_ON_PAGE(!PageHead(page_head), page_head);
> -		VM_BUG_ON_PAGE(page_mapcount(page) != 0, page);
>  		if (put_page_testzero(page_head)) {
>  			/*
>  			 * If this is the tail of a slab THP page,

Reviewed-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>

If you resend a consolidated commit with the two changes in the same
commit, feel free to retain the Reviewed-by for both.

Optionally you could turn it into a VM_BUG_ON_PAGE(page_mapcount(page)
< 0, page) but it's up to you.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]