Re: mm: BUG in unmap_page_range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/11/2014 07:39 AM, Hugh Dickins wrote:
> On Wed, 10 Sep 2014, Sasha Levin wrote:
>> On 09/10/2014 03:36 PM, Hugh Dickins wrote:
>>> Right, and Sasha  reports that that can fire, but he sees the bug
>>> with this patch in and without that firing.
>>
>> I've changed that WARN_ON_ONCE() to a VM_BUG_ON_VMA() to get some useful
>> VMA information out, and got the following:
> 
> Well, thanks, but Mel and I have both failed to perceive any actual
> problem arising from that peculiarity.  And Mel's warning, and the 900s
> in yesterday's dumps, have shown that it is not correlated with the
> pte_mknuma() bug we are chasing.  So there isn't anything that I want to
> look up in these vmas.  Or did you notice something interesting in them?

I thought this was a separate issue that would need taking care of as well.

>> And on a maybe related note, I've started seeing the following today. It may
>> be because we fixed mbind() in trinity but it could also be related to
> 
> The fixed trinity may be counter-productive for now, since we think
> there is an understandable pte_mknuma() bug coming from that direction,
> but have not posted a patch for it yet.

I'm still seeing the bug with fixed trinity, it was a matter of adding more flags
to mbind.

>> this issue (free_pgtables() is in the call chain). If you don't think it has
>> anything to do with it let me know and I'll start a new thread:
>>
>> [ 1195.996803] BUG: unable to handle kernel NULL pointer dereference at           (null)
>> [ 1196.001744] IP: __rb_erase_color (include/linux/rbtree_augmented.h:107 lib/rbtree.c:229 lib/rbtree.c:367)
>> [ 1196.001744] Call Trace:
>> [ 1196.001744] vma_interval_tree_remove (mm/interval_tree.c:24)
>> [ 1196.001744] __remove_shared_vm_struct (mm/mmap.c:232)
>> [ 1196.001744] unlink_file_vma (mm/mmap.c:246)
>> [ 1196.001744] free_pgtables (mm/memory.c:547)
>> [ 1196.001744] exit_mmap (mm/mmap.c:2826)
>> [ 1196.001744] mmput (kernel/fork.c:654)
>> [ 1196.001744] do_exit (./arch/x86/include/asm/thread_info.h:168 kernel/exit.c:461 kernel/exit.c:746)
> 
> I didn't study in any detail, but this one seems much more like the
> zeroing and vma corruption that you've been seeing in other dumps.
> 
> Though a single pte_mknuma() crash could presumably be caused by vma
> corruption (but I think not mere zeroing), the recurrent way in which
> you hit that pte_mknuma() bug in particular makes it unlikely to be
> caused by random corruption.
> 
> You are generating new crashes faster than we can keep up with them.
> Would this be a suitable point for you to switch over to testing
> 3.17-rc, to see if that is as unstable for you as -next is?
> 
> That VM_BUG_ON(!(val & _PAGE_PRESENT)) is not in the 3.17-rc tree,
> but I think you can "safely" add it to 3.17-rc.  Quotes around
> "safely" meaning that we know that there's a bug to hit, at least
> in -next, but I don't think it's going to be hit for stupid obvious
> reasons.

I'll try it, usually I just hit a bunch of issues that were already fixed
in -next, which is why I try sticking to one tree.

> And you're using a gcc 5 these days?  That's another variable to
> try removing from the mix, to see if it makes a difference.

I'm seeing the BUG getting hit with 4.7.2, so I don't think it's compiler
dependant. I'll try reproducing everything I reported yesterday with 4.7.2
just in case, but I don't think that this is the issue.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]