On Tue, Oct 15, 2013 at 02:32:54PM +0300, Kirill A. Shutemov wrote: > Hugh Dickins wrote: > > Occasionally we hit the BUG_ON(pmd_trans_huge(*pmd)) at the end of > > __split_huge_page_pmd(): seen when doing madvise(,,MADV_DONTNEED). > > > > It's invalid: we don't always have down_write of mmap_sem there: > > a racing do_huge_pmd_wp_page() might have copied-on-write to another > > huge page before our split_huge_page() got the anon_vma lock. > > > > Forget the BUG_ON, just go back and try again if this happens. > > > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > > Cc: stable@xxxxxxxxxxxxxxx > > Looks reasonable to me. > > Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > > madvise(MADV_DONTNEED) was aproblematic with THP before. Is a big win having > mmap_sem taken on read rather than on write for it? Yeah it caused all those pmd_trans_unstable and pmd_none_or_trans_huge_or_clear_bad and pmd_read_atomic in common code. But I didn't want to regress the scalability of MADV_DONTNEED... I think various apps use MADV_DONTNEED to free memory (including very KVM in the balloon driver and probably JVM and other JIT). none or huge pmds are unstable without mmap_sem for writing and without page_table_lock (or in general pmd_trans_huge_lock). It's identical to the pte being unstable if mmap_sem is held for reading and we don't hold the PT lock, except the pte can only have two states and they're both unstable. hugepmds have three states, and the only stable state of the tree is when it points to a regular pte (the third state that 4k ptes cannot have). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>