Re: [RFC v6 1/2] mm: prototype: rid swapoff of quadratic complexity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 11 Nov 2014, Kelley Nielsen wrote:

> Changes since v5:
> * Make multiple orphan checking passes at the end of try_to_unuse
> * Remove pages from swapcache if we hold the only reference in
> * unuse_pte_range

I'm very glad you're back on this, Kelley, thank you.

I gave it a try this weekend, and at first I was very pleased with
the results - up until make died with an unexplained SIGSEGV, but
that was only after several hours (an unexplained SIGSEGV usually
means a swap error of some kind in my testing).

At the time I put that down to bringing KSM in, and I wasn't
expecting KSM to be right yet anyway.  But after looking at the code,
I was a little disappointed to find the MAX_RETRIES 3 stuff: and
there is a particular bug there which was probably responsible for
my SIGSEGV - when try_to_unuse() has done MAX_RETRIES and gives up,
it returns 0 pretending success: no, please make that, say, -EBUSY.

When I made that change, and at the same time switched from testing
swap on SSD to swap on HDD, it didn't go so well: the eighth swapoff
(under concurrent swapping load) then admitted that it had failed.

Even -EBUSY is not really right there: I do like your improvements,
and want to get them in, but it is a regression to fail under load
like that.  And I imagine that the bigger the swap area, or the
slower the disk, or the heavier the load, then the more likely a
failure is in that case.

It's good that you're now removing from swapcache when you can in
unuse_pte_range(), but that's just not enough: I'm pretty sure it
does need the rmap walk that I suggested back in April (and I'm
hopeful that with that, we could then get rid of those retries -
hopeful but far from certain).  So that a page can be taken out of
swap in one go, with no danger of being repeatedly refaulted in
behind you.

I'm not sure where you want to take it from here.  I'd be happy
for this to go into mmotm (with the changes Rik proposed for 2/2,
and the -EBUSY at MAX_RETRIES), so that it gets wider exposure
(few people will in practice be disadvantaged by that -EBUSY):
so long as one of us commits to fixing up the regression before
it goes to Linus.

I wonder what your time situation is: I wonder if you just haven't
found time to add the rmap walk, or are not confident in doing so.

To be honest, I'd rather like to do that bit myself: apart from
anything else, it is the only way in which I will properly get to
grips with your changes - I'm a coder not a reviewer.  But there
is no chance whatever that I'll get to do it before Christmas,
and whether I find time to do it then rather depends on what
else comes up that may demand higher priority.

And, I don't mean to pressure you with more than you have time for,
but if you do have time, please remember to remove the CONFIG_SHMEM
CONFIG_SWAP stuff from lib/radix-tree.c and radix-tree.h (Cc
Johannes Weiner <hannes@xxxxxxxxxxx> and Konstantin Khlebnikov
<koct9i@xxxxxxxxx>, they will give a cheer), and rename all those
"unuse"s to "swapoff"s (Cc Oleg Nesterov <oleg@xxxxxxxxxx> and
Pavel Machek <pavel@xxxxxx>) to be clearer, and to avoid
confusion with the other unuse_mm().

Of course, anyone can do those cleanups, if you don't have the time:
but you have earned the right, so I'd rather see them in your name.

By the way, I'll Cc you in a few moments on a swapoff fix I tracked
down last week: nothing to do with your changes or my testing of them,
but something I found in testing an unrelated patchset.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]