Re: swapoff() runs forever

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 12 Apr 2012, Richard Weinberger wrote:
> Am 09.04.2012 20:40, schrieb Hugh Dickins:
> > I've not seen any such issue in recent months (or years), but
> > I've not been using UML either.  The most likely cause that springs
> > to mind would be corruption of the vmalloc'ed swap map: that would
> > be very likely to cause such a hang.
> 
> It does not look like a swap map corruption.
> If I restart most user space processes swapoff() terminates fine.

Right, thanks, that's very useful info.

> Maybe it is a refcounting problem?

You may prove to be correct; but since killing and restarting
processes fixes it up without (I presume) issuing warnings,
it doesn't sound like a refcounting problem to me.

> 
> > You say "recent Linux kernels": I wonder what "recent" means.
> > Is this something you can reproduce quickly and reliably enough
> > to do a bisection upon?
> > 
> 
> I can reproduce the issue on any UML kernel.
> The oldest I've tested was 2.6.20.
> Therefore, bug was not introduced by me. B-)

More useful info, thank you.

I think I've spotted two problems in the UML swp_entry_t handling;
but checking if I'm right, and if they're relevant, and how to fix them,
I'll leave to you - it's years since I tried UML and I remember 0.

One, likely to be your problem.  Take a look at unuse_pte_range() in
mm/swapfile.c, where it searches the page table for the swp_pte it's
trying to "unuse".  And take a look at set_pte() in
arch/um/include/asm/pgtable.h, which appears to add a mysterious
_PAGE_NEWPAGE bit into the page table entry.  And UML doesn't provide
an alternative to generic pte_same() in include/asm-genric/pgtable.h.

My guess is that the _NEWPAGE bit prevents swapoff from matching pte
against swap entry in all or some cases (I didn't look to see if
_NEWPAGE is sometimes cleared later).

Probably a good fix to try would be providing a UML pte_same() which
takes that into account; but I don't know what conditionals it should
contain, and whether it would become too inefficient.  Or, if _NEWPAGE
is always set in a swap pte, then swp_entry_to_pte() needs to set it.

(A word of warning if you're unfamiliar with swap entries: there's the
kernel's internal representation swp_entry_t, which has offset in the
low-order and type in the high-order, for efficient use with radix_tree
- see include/linux/swapops.h; and then there's the arch-dependent
representation as a page table entry, which rearranges the bits so
as not to be confused with a good present page table entry, and
traditionally has type on the lower side of offset.)

The other thing I noticed first, probably not relevant to the bug you're
seeing since I think you'd have mentioned if you had two swapfiles; but
the two or more swapfile case looks very broken to me.  _PAGE_PROTNONE is
0x010 but __swp_type(x) is (((x).val >> 4) & 0x3f): unless I'm confused,
a swap entry of type 1 will look just like a PROT_NONE pte.

Or maybe that's resolved by the _PAGE_NEWPAGE and _PAGE_NEWPROT bits,
I didn't spend time working out what they're up to.

include/linux/swap.h does not allow MAX_SWAPFILES to exceed 32,
so you can easily change __swp_type(x) to use 5 and 0x1f instead
(with 5 instead of 4 in __swp_entry too of course).  Though it doesn't
cause error, I wonder where the 11 in __swp_offset and __swp_entry
comes from: I think you can support larger swap by making it 10.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]