On Thu, Mar 20, 2014 at 09:47:04PM -0400, Sasha Levin wrote: > On 02/10/2014 04:44 PM, Naoya Horiguchi wrote: > >swapin_walk_pmd_entry() is defined as pmd_entry(), but it has no code > >about pmd handling (except pmd_none_or_trans_huge_or_clear_bad, but the > >same check are now done in core page table walk code). > >So let's move this function on pte_entry() as swapin_walk_pte_entry(). > > > >Signed-off-by: Naoya Horiguchi<n-horiguchi@xxxxxxxxxxxxx> > > This patch seems to generate: Sasha, thank you for reporting. I forgot to unlock ptlock before entering read_swap_cache_async() which holds page lock in it, as a result lock ordering rule (written in mm/rmap.c) was violated (we should take in the order of mmap_sem -> page lock -> ptlock.) The following patch should fix this. Could you test with it? --- >From c0d56af5874dc40467c9b3a0f9e53b39b3c4f1c5 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> Date: Thu, 20 Mar 2014 22:30:51 -0400 Subject: [PATCH] madvise: fix locking in force_swapin_readahead() We take mmap_sem and ptlock in walking over ptes with swapin_walk_pte_entry(), but inside it we call read_swap_cache_async() which holds page lock. So we should unlock ptlock to call read_swap_cache_async() to meet lock order rule (mmap_sem -> page lock -> ptlock). Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx> Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> --- mm/madvise.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/madvise.c b/mm/madvise.c index 5e957b984c14..ed9c31e3b5ff 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -141,24 +141,35 @@ static int swapin_walk_pte_entry(pte_t *pte, unsigned long start, swp_entry_t entry; struct page *page; struct vm_area_struct *vma = walk->vma; + spinlock_t *ptl = (spinlock_t *)walk->private; if (pte_present(*pte) || pte_none(*pte) || pte_file(*pte)) return 0; entry = pte_to_swp_entry(*pte); if (unlikely(non_swap_entry(entry))) return 0; + spin_unlock(ptl); page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, start); + spin_lock(ptl); if (page) page_cache_release(page); return 0; } +static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, + unsigned long end, struct mm_walk *walk) +{ + walk->private = pte_lockptr(walk->mm, pmd); + return 0; +} + static void force_swapin_readahead(struct vm_area_struct *vma, unsigned long start, unsigned long end) { struct mm_walk walk = { .mm = vma->vm_mm, + .pmd_entry = swapin_walk_pmd_entry, .pte_entry = swapin_walk_pte_entry, }; -- 1.8.5.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>