Re: [patch 4/4 v3]swap: make cluster allocation per-cpu

Hugh Dickins <hughd@xxxxxxxxxx> · Wed, 20 Mar 2013 14:58:17 -0700 (PDT)

On Wed, 20 Mar 2013, Andrew Morton wrote:
> On Tue, 19 Mar 2013 16:09:01 -0700 (PDT) Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> 
> > But I'm not all that keen on this one.  Partly because I suspect that
> > this per-cpu'ing won't in the end be the right approach
> 
> That was my reaction.  The CPU isn't the logical thing upon which to
> key the clustering.  It mostly-works, because of the way in which the
> kernel operates but it's a bit of a flukey hack.  A more logical thing
> around which to subdivide the clustering is the mm_struct.

You do suggest that from time to time, and someone did once send a patch
to organize it by vma.  That probably behaves very nicely under a simple
load, when pages coming off the bottom of the lru are from increasing
addresses of the same mm; but what we have already works well enough
for such a simple case (or should do: bugs can creep in and upset it).

Under a heavier mixed load, it behaved much worse than what we do
at present.  That was in swapping to hard disk, where the additional
seeks to place pages from different vmas in separate locations were
costly; SSDs don't have seek cost, but I'd expect their erase blocks
to impose an equivalent (not necessarily equal) cost.

One of the great attractions of SSD for swap is the absence of seek
cost when faulting back in; and even with hard disk, we don't know
whether or when pages will be faulted back in.  The better we can
allocate contiguously when swapping out, the faster swap will be.
I say we need to allocate disk location just in time before writing.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>