Re: [PATCH] mm: swap: Use swapfiles in priority order

Weijie Yang <weijieut@xxxxxxxxx> · Sun, 16 Feb 2014 10:59:06 +0800

 On Fri, Feb 14, 2014 at 9:10 PM, Christian Ehrhardt
<ehrhardt@xxxxxxxxxxxxxxxxxx> wrote:
> Weijie Yang <weijie.yang.kh <at> gmail.com> writes:
>
>>
>> On Thu, Feb 13, 2014 at 6:42 PM, Mel Gorman <mgorman <at> suse.de> wrote:
> [...]
>> > -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
>> > +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
>>
> [...]
>> Does it lead to a "schlemiel the painter's algorithm"?
>> (please forgive my rude words, but I can't find a precise word to describe it
>>
>> How about modify it like this?
>>
> [...]
>> - next = swap_list.head;
>> + next = type;
> [...]
>
> Hi,
> unfortunately withou studying the code more thoroughly I'm not even sure if
> you meant you code to extend or replace Mels patch.
>
> To be sure about your intention.  You refered to algorithm scaling because
> you were afraid the new code would scan the full list all the time right ?
>
> But simply letting the machines give a try for both options I can now
> qualify both.
>
> Just your patch creates a behaviour of jumping over priorities (see the
> following example), so I hope you meant combining both patches.
> With that in mind the patch I eventually tested the combined patch looking
> like this:

Hi Christian,

My patch is not appropriate, so there is no need to combine it with Mel's patch.

What I worried about Mel's patch is not only the search efficiency,
actually it has
negligible impact on system, but also the following scenario:

If two swapfiles have the same priority, in ordinary semantic, they
should be used
in balance. But with Mel's patch, it will always get the free
swap_entry from the
swap_list.head in priority order, I worry it could break the balance.

I think you can test this scenario if you have available test machines.

Appreciate for your done.

> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 612a7c9..53a3873 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -650,7 +650,7 @@ swp_entry_t get_swap_page(void)
>                 goto noswap;
>         atomic_long_dec(&nr_swap_pages);
>
> -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
> +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
>                 hp_index = atomic_xchg(&highest_priority_index, -1);
>                 /*
>                  * highest_priority_index records current highest priority swap
> @@ -675,7 +675,7 @@ swp_entry_t get_swap_page(void)
>                 next = si->next;
>                 if (next < 0 ||
>                     (!wrapped && si->prio != swap_info[next]->prio)) {
> -                       next = swap_list.head;
> +                       next = type;
>                         wrapped++;
>                 }
>
>
> At least for the two different cases we identified to fix with it the new
> code works as well:
> I) incrementing swap now in proper priority order
> Filename                                Type            Size    Used    Priority
> /testswap1                              file            100004  100004  8
> /testswap2                              file            100004  100004  7
> /testswap3                              file            100004  100004  6
> /testswap4                              file            100004  100004  5
> /testswap5                              file            100004  100004  4
> /testswap6                              file            100004  68764   3
> /testswap7                              file            100004  0       2
> /testswap8                              file            100004  0       1
>
> II) comparing a memory based block device "as one" vs "split into 8 pieces"
> as swap target(s).
> Like with Mels patch alone I'm able to achieve 1.5G/s TP on the
> overcommitted memory no matter how much swap targets I split it into.
>
> So while I can't speak for the logical correctness of your addition to the
> patch at least in terms of effectiveness it seems fine.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>