Hello Kirill, On Mon, Aug 29, 2016 at 03:42:33PM +0300, Kirill A. Shutemov wrote: > @@ -898,13 +899,13 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, > /* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */ > if (ret & VM_FAULT_RETRY) { > down_read(&mm->mmap_sem); > - if (hugepage_vma_revalidate(mm, address)) { > + if (hugepage_vma_revalidate(mm, address, &vma)) { > /* vma is no longer available, don't continue to swapin */ > trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); > return false; > } > /* check if the pmd is still valid */ > - if (mm_find_pmd(mm, address) != pmd) > + if (mm_find_pmd(mm, address) != pmd || vma != fe.vma) > return false; > } > if (ret & VM_FAULT_ERROR) { You check if the vma changed if the mmap_sem was released by the VM_FAULT_RETRY case but not below: /* * Prevent all access to pagetables with the exception of * gup_fast later handled by the ptep_clear_flush and the VM > @@ -994,7 +995,7 @@ static void collapse_huge_page(struct mm_struct *mm, > * handled by the anon_vma lock + PG_lock. > */ > down_write(&mm->mmap_sem); > - result = hugepage_vma_revalidate(mm, address); > + result = hugepage_vma_revalidate(mm, address, &vma); > if (result) > goto out; > /* check if the pmd is still valid */ if (mm_find_pmd(mm, address) != pmd) goto out; Here you go ahead without care if the vma has changed as long as the "vma" pointer was updated to the new one, and the pmd is still present and stable (present and not huge) and all vma details matched as before. Either we care that the vma changed in both places or we don't in either of the two places. The idea was that even if the vma changed it doesn't matter because it's still good to proceed for a collapse if all revalidation check pass. What we failed at, was in refreshing the pointer of the vma to the new one after the vma revalidation passed, so that the code that goes ahead uses the right vma pointer and not the stale one we got initially. Now it may give a perception that it is safer to check fa.vma != vma but in reality it is not, because the vma may be freed and reallocated in exactly the same address... So I think the vma != fe.vma check shall be removed because no matter what the safety of the vma revalidate cannot come from checking if the pointer has not changed and it must come from something else. Now reading __collapse_huge_page_swapin I noticed some other unrelated issues. if (referenced < HPAGE_PMD_NR/2) { trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); return false; } Referenced is not updated ever by __collapse_huge_page_swapin. In turn the above check would better be moved out of the loop, before we start to kmap the pte. Bigger issue is that leaving it where it is right now, we don't seem to unmap the pte as needed when the above "return false" above runs, which means we're leaking kmap_atomic entries, and that will also get fixed by moving the check before the loop starts and before the first pte_offset_map. swapped_in will showup zero instead of 1 in the tracing at all times but I doubt it makes any difference so I would move it out of the loop instead of adding a pte_unmap before returning, so it runs faster. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>