Re: mm: kernel BUG at mm/memory.c:1230

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 24 May 2012, Sasha Levin wrote:
> On Thu, May 24, 2012 at 9:07 PM, Andrew Morton
> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > On Thu, 24 May 2012 20:27:34 +0200
> > Sasha Levin <levinsasha928@xxxxxxxxx> wrote:
> >
> >> Hi all,
> >>
> >> During fuzzing with trinity inside a KVM tools guest, using latest linux-next, I've stumbled on the following:
> >>
> >> [ 2043.098949] ------------[ cut here ]------------
> >> [ 2043.099014] kernel BUG at mm/memory.c:1230!
> >
> > That's
> >
> >        VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
> >
> > in zap_pmd_range()?
> 
> Yup.
> 
> > The assertion was added in Jan 2011 by 14d1a55cd26f1860 ("thp: add
> > debug checks for mapcount related invariants").  AFAICT it's just wrong
> > on the exit path.  Unclear why it's triggering now...

I've been round this loop before with that particular VM_BUG_ON.

At first I thought like Andrew, that it's glaringly wrong on the exit
path; but then changed my mind.

When munmapping, we certainly can arrive here with an unaligned addr
and next; but in that case rwsem_is_locked.

Whereas in exiting, rwsem is not locked, but we're going linearly upwards,
and whenever we walk into a pmd_trans_huge area, both addr and next should
be hpage aligned: the vma bounds are unsuited to THP if they're unaligned.

Other cases equally should not arise: madvise MADV_DONTNEED should
have rwsem_is_locked; and truncation or hole-punching shouldn't be
possible on a pure-anonymous (!vma->vm_ops) area considered for THP.

But I cannot remember what brought me here before: a crash in testing
on one of my machines, which further investigation root-caused elsewhere?
or a report from someone else? or noticed when auditing another problem?
I'm frustrated not to recall.

> 
> I'm not sure if that's indeed the issue or not, but note that this is
> the first time I've managed to trigger that with the fuzzer, and it's
> not that easy to reproduce. Which is a bit odd for code that was there
> for 4 months...

I'm keeping off the linux-next for the moment; I'll worry about this
more if it shows up when we try 3.5-rc1.  Your fuzzing tells that my
logic above is wrong, but maybe it's just a passing defect in next.

Hugh

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]