On Mon, 11 Jan 2016, Aneesh Kumar K.V wrote: > Hugh Dickins <hughd@xxxxxxxxxx> writes: > > > Swapoff after swapping hangs on the G5. That's because the _PAGE_PTE > > bit, added by set_pte_at(), is not expected by swapoff: so swap ptes > > cannot be recognized. > > > > I'm not sure whether a swap pte should or should not have _PAGE_PTE set: > > this patch assumes not, and fixes set_pte_at() to set _PAGE_PTE only on > > present entries. > > One of the reason we added _PAGE_PTE is to enable HUGETLB migration. So > we want migratio ptes to have _PAGE_PTE set. Okay, I won't pretend to understand the role of _PAGE_PTE in that; but if it helps you to have _PAGE_PTE set in (swap and) migration entries, that's very easily done with the alternative I suggested for pgtable.h: -#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) }) -#define __swp_entry_to_pte(x) __pte((x).val) +#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) & ~_PAGE_PTE }) +#define __swp_entry_to_pte(x) __pte((x).val | _PAGE_PTE) I did test that variant (with set_pte_at() restored to how you have it); but not understanding _PAGE_PTE, I thought it odd to have in a swap entry. > > > > > But if that's wrong, a reasonable alternative would be to > > #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) & ~_PAGE_PTE }) > > #define __swp_entry_to_pte(x) __pte((x).val | _PAGE_PTE) > > > > We do clear _PAGE_PTE bits, when converting swp_entry_t to type and > offset. Can you share the stack trace for the hang, which will help me > understand this more ? . The stack trace can be anywhere below try_to_unuse() in mm/swapfile.c, since swapoff is circling around and around that function, reading from each used swap block into a page, then trying to find where that page belongs, looking at every non-file pte of every mm that ever swapped. The code to look at is unuse_pte_range(), which at the top does pte_t swp_pte = swp_entry_to_pte(entry) to get the form it hopes to find in the page table; then scans doing if (unlikely(maybe_same_pte(*pte, swp_pte))) { on each pte slot. Ignoring the MEM_SOFT_DIRTY complication (which had its own independent bug) maybe_same_pte() just does pte_same(). Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>