Re: [PATCH 3/5] mm/vmalloc.c: correct lazy_max_pages() return value

zijun_hu <zijun_hu@xxxxxxxx> · Fri, 23 Sep 2016 13:00:35 +0800



On 2016/9/23 11:30, Nicholas Piggin wrote:
> On Fri, 23 Sep 2016 00:30:20 +0800
> zijun_hu <zijun_hu@xxxxxxxx> wrote:
> 
>> On 2016/9/22 20:37, Michal Hocko wrote:
>>> On Thu 22-09-16 09:13:50, zijun_hu wrote:  
>>>> On 09/22/2016 08:35 AM, David Rientjes wrote:  
>>> [...]  
>>>>> The intent is as it is implemented; with your change, lazy_max_pages() is 
>>>>> potentially increased depending on the number of online cpus.  This is 
>>>>> only a heuristic, changing it would need justification on why the new
>>>>> value is better.  It is opposite to what the comment says: "to be 
>>>>> conservative and not introduce a big latency on huge systems, so go with
>>>>> a less aggressive log scale."  NACK to the patch.
>>>>>  
>>>> my change potentially make lazy_max_pages() decreased not increased, i seems
>>>> conform with the comment
>>>>
>>>> if the number of online CPUs is not power of 2, both have no any difference
>>>> otherwise, my change remain power of 2 value, and the original code rounds up
>>>> to next power of 2 value, for instance
>>>>
>>>> my change : (32, 64] -> 64
>>>> 	     32 -> 32, 64 -> 64
>>>> the original code: [32, 63) -> 64
>>>>                    32 -> 64, 64 -> 128  
>>>
>>> You still completely failed to explain _why_ this is an improvement/fix
>>> or why it matters. This all should be in the changelog.
>>>   
>>
>> Hi npiggin,
>> could you give some comments for this patch since lazy_max_pages() is introduced
>> by you
>>
>> my patch is based on the difference between fls() and get_count_order() mainly
>> the difference between fls() and get_count_order() will be shown below
>> more MM experts maybe help to decide which is more suitable
>>
>> if parameter > 1, both have different return value only when parameter is
>> power of two, for example
>>
>> fls(32) = 6 VS get_count_order(32) = 5
>> fls(33) = 6 VS get_count_order(33) = 6
>> fls(63) = 6 VS get_count_order(63) = 6
>> fls(64) = 7 VS get_count_order(64) = 6
>>
>> @@ -594,7 +594,9 @@ static unsigned long lazy_max_pages(void) 
>> { 
>>     unsigned int log; 
>>
>> -    log = fls(num_online_cpus()); 
>> +    log = num_online_cpus(); 
>> +    if (log > 1) 
>> +        log = (unsigned int)get_count_order(log); 
>>
>>     return log * (32UL * 1024 * 1024 / PAGE_SIZE); 
>> } 
>>
> 
> To be honest, I don't think I chose it with a lot of analysis.
> It will depend on the kernel usage patterns, the arch code,
> and the CPU microarchitecture, all of which would have changed
> significantly.
> 
> I wouldn't bother changing it unless you do some bench marking
> on different system sizes to see where the best performance is.
> (If performance is equal, fewer lazy pages would be better.)
> 
> Good to see you taking a look at this vmalloc stuff. Don't be
> discouraged if you run into some dead ends.
> 
> Thanks,
> Nick
> 
thanks for your reply
please don't pay attention to this patch any more since i don't have
condition to do many test and comparison

i just feel my change maybe be consistent with operation of rounding up
to power of 2


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>