On Thu, 24 May 2012, Sasha Levin wrote: > On Thu, May 24, 2012 at 9:07 PM, Andrew Morton > <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, 24 May 2012 20:27:34 +0200 > > Sasha Levin <levinsasha928@xxxxxxxxx> wrote: > > > >> Hi all, > >> > >> During fuzzing with trinity inside a KVM tools guest, using latest linux-next, I've stumbled on the following: > >> > >> [ 2043.098949] ------------[ cut here ]------------ > >> [ 2043.099014] kernel BUG at mm/memory.c:1230! > > > > That's > > > > VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem)); > > > > in zap_pmd_range()? > > Yup. > > > The assertion was added in Jan 2011 by 14d1a55cd26f1860 ("thp: add > > debug checks for mapcount related invariants"). AFAICT it's just wrong > > on the exit path. Unclear why it's triggering now... I've been round this loop before with that particular VM_BUG_ON. At first I thought like Andrew, that it's glaringly wrong on the exit path; but then changed my mind. When munmapping, we certainly can arrive here with an unaligned addr and next; but in that case rwsem_is_locked. Whereas in exiting, rwsem is not locked, but we're going linearly upwards, and whenever we walk into a pmd_trans_huge area, both addr and next should be hpage aligned: the vma bounds are unsuited to THP if they're unaligned. Other cases equally should not arise: madvise MADV_DONTNEED should have rwsem_is_locked; and truncation or hole-punching shouldn't be possible on a pure-anonymous (!vma->vm_ops) area considered for THP. But I cannot remember what brought me here before: a crash in testing on one of my machines, which further investigation root-caused elsewhere? or a report from someone else? or noticed when auditing another problem? I'm frustrated not to recall. > > I'm not sure if that's indeed the issue or not, but note that this is > the first time I've managed to trigger that with the fuzzer, and it's > not that easy to reproduce. Which is a bit odd for code that was there > for 4 months... I'm keeping off the linux-next for the moment; I'll worry about this more if it shows up when we try 3.5-rc1. Your fuzzing tells that my logic above is wrong, but maybe it's just a passing defect in next. Hugh