Re: [PATCH 7/7] slub: initial bulk free implementation

Joonsoo Kim <js1304@xxxxxxxxx> · Tue, 16 Jun 2015 21:05:47 +0900

2015-06-16 17:57 GMT+09:00 Jesper Dangaard Brouer <brouer@xxxxxxxxxx>:
> On Tue, 16 Jun 2015 10:21:10 +0200
> Jesper Dangaard Brouer <brouer@xxxxxxxxxx> wrote:
>
>>
>> On Tue, 16 Jun 2015 16:28:06 +0900 Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> wrote:
>>
>> > Is this really better than just calling __kmem_cache_free_bulk()?
>>
>> Yes, as can be seen by cover-letter, but my cover-letter does not seem
>> to have reached mm-list.
>>
>> Measurements for the entire patchset:
>>
>> Bulk - Fallback bulking           - fastpath-bulking
>>    1 -  47 cycles(tsc) 11.921 ns  -  45 cycles(tsc) 11.461 ns   improved  4.3%
>>    2 -  46 cycles(tsc) 11.649 ns  -  28 cycles(tsc)  7.023 ns   improved 39.1%
>>    3 -  46 cycles(tsc) 11.550 ns  -  22 cycles(tsc)  5.671 ns   improved 52.2%
>>    4 -  45 cycles(tsc) 11.398 ns  -  19 cycles(tsc)  4.967 ns   improved 57.8%
>>    8 -  45 cycles(tsc) 11.303 ns  -  17 cycles(tsc)  4.298 ns   improved 62.2%
>>   16 -  44 cycles(tsc) 11.221 ns  -  17 cycles(tsc)  4.423 ns   improved 61.4%
>>   30 -  75 cycles(tsc) 18.894 ns  -  57 cycles(tsc) 14.497 ns   improved 24.0%
>>   32 -  73 cycles(tsc) 18.491 ns  -  56 cycles(tsc) 14.227 ns   improved 23.3%
>>   34 -  75 cycles(tsc) 18.962 ns  -  58 cycles(tsc) 14.638 ns   improved 22.7%
>>   48 -  80 cycles(tsc) 20.049 ns  -  64 cycles(tsc) 16.247 ns   improved 20.0%
>>   64 -  87 cycles(tsc) 21.929 ns  -  74 cycles(tsc) 18.598 ns   improved 14.9%
>>  128 -  98 cycles(tsc) 24.511 ns  -  89 cycles(tsc) 22.295 ns   improved  9.2%
>>  158 - 101 cycles(tsc) 25.389 ns  -  93 cycles(tsc) 23.390 ns   improved  7.9%
>>  250 - 104 cycles(tsc) 26.170 ns  - 100 cycles(tsc) 25.112 ns   improved  3.8%
>>
>> I'll do a compare against the previous patch, and post the results.
>
> Compare against previous patch:
>
> Run:   previous-patch            - this patch
>   1 -   49 cycles(tsc) 12.378 ns -  43 cycles(tsc) 10.775 ns  improved 12.2%
>   2 -   37 cycles(tsc)  9.297 ns -  26 cycles(tsc)  6.652 ns  improved 29.7%
>   3 -   33 cycles(tsc)  8.348 ns -  21 cycles(tsc)  5.347 ns  improved 36.4%
>   4 -   31 cycles(tsc)  7.930 ns -  18 cycles(tsc)  4.669 ns  improved 41.9%
>   8 -   30 cycles(tsc)  7.693 ns -  17 cycles(tsc)  4.404 ns  improved 43.3%
>  16 -   32 cycles(tsc)  8.059 ns -  17 cycles(tsc)  4.493 ns  improved 46.9%

So, in your test, most of objects may come from one or two slabs and your
algorithm is well optimized for this case. But, is this workload normal case?
If most of objects comes from many different slabs, bulk free API does
enabling/disabling interrupt very much so I guess it work worse than
just calling __kmem_cache_free_bulk(). Could you test this case?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>