Re: [RESEND v4 5/7] mm, slab_common: Make kmalloc_caches[] start at size KMALLOC_MIN_SIZE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 16, 2019 at 5:38 AM David Rientjes <rientjes@xxxxxxxxxx> wrote:
>
> On Mon, 16 Sep 2019, Pengfei Li wrote:
>
> > Currently, kmalloc_cache[] is not sorted by size, kmalloc_cache[0]
> > is kmalloc-96, kmalloc_cache[1] is kmalloc-192 (when ARCH_DMA_MINALIGN
> > is not defined).
> >
> > As suggested by Vlastimil Babka,
> >
> > "Since you're doing these cleanups, have you considered reordering
> > kmalloc_info, size_index, kmalloc_index() etc so that sizes 96 and 192
> > are ordered naturally between 64, 128 and 256? That should remove
> > various special casing such as in create_kmalloc_caches(). I can't
> > guarantee it will be possible without breaking e.g. constant folding
> > optimizations etc., but seems to me it should be feasible. (There are
> > definitely more places to change than those I listed.)"
> >
> > So this patch reordered kmalloc_info[], kmalloc_caches[], and modified
> > kmalloc_index() and kmalloc_slab() accordingly.
> >
> > As a result, there is no subtle judgment about size in
> > create_kmalloc_caches(). And initialize kmalloc_cache[] from 0 instead
> > of KMALLOC_SHIFT_LOW.
> >
> > I used ./scripts/bloat-o-meter to measure the impact of this patch on
> > performance. The results show that it brings some benefits.
> >
> > Considering the size change of kmalloc_info[], the size of the code is
> > actually about 641 bytes less.
> >
>
> bloat-o-meter is reporting a net benefit of -241 bytes for this, so not
> sure about relevancy of the difference for only kmalloc_info.
>

Thanks for your comments.

The size of kmalloc_info has been increased from 432 to 832 (it was
renamed to all_kmalloc_info ). So when the change in kmalloc_info size
is not included, it actually reduces 641 bytes.

> This, to me, looks like increased complexity for the statically allocated
> arrays vs the runtime complexity when initializing the caches themselves.

For runtime kmalloc requests, the implementation of kmalloc_slab() is
no different than before.
For constant kmalloc requests, the smaller size of .text means better
(the compiler does constant optimization).
Therefore, I don't think this patch adds complexity.

> Not sure that this is an improvement given that you still need to do
> things like
>
> +#if KMALLOC_SIZE_96_EXIST == 1
> +       if (size > 64 && size <= 96) return (7 - KMALLOC_IDX_ADJ_0);
> +#endif
> +
> +#if KMALLOC_SIZE_192_EXIST == 1
> +       if (size > 128 && size <= 192) return (8 - KMALLOC_IDX_ADJ_1);
> +#endif

kmalloc_index() is difficult to handle for me.

At first, I made the judgment in the order of size in kmalloc_index(),

----
/* Order 96, 192 */
static __always_inline unsigned int kmalloc_index(size_t size)
{
...
if (size <=                8) return ( 3 - KMALLOC_IDX_ADJ_0);
if (size <=               16) return ( 4 - KMALLOC_IDX_ADJ_0);
if (size <=               32) return ( 5 - KMALLOC_IDX_ADJ_0);
if (size <=               64) return ( 6 - KMALLOC_IDX_ADJ_0);
#if KMALLOC_SIZE_96_EXIST == 1
if (size <=               96) return ( 7 - KMALLOC_IDX_ADJ_0);
#endif
if (size <=              128) return ( 7 - KMALLOC_IDX_ADJ_1);
#if KMALLOC_SIZE_192_EXIST == 1
if (size <=              192) return ( 8 - KMALLOC_IDX_ADJ_1);
#endif
if (size <=              256) return ( 8 - KMALLOC_IDX_ADJ_2);
...
}

but bloat-o-meter shows that I did a bad job.
----
$ ./scripts/bloat-o-meter vmlinux-base vmlinux-patch_1-5-order_96_192
add/remove: 3/7 grow/shrink: 129/167 up/down: 3691/-2530 (1161)
Function                                     old     new   delta
all_kmalloc_info                               -     832    +832
jhash                                        744    1119    +375
__regmap_init                               3252    3411    +159
drm_mode_atomic_ioctl                       2373    2479    +106
apply_wqattrs_prepare                        449     531     +82
process_preds                               1772    1851     +79
amd_uncore_cpu_up_prepare                    251     327     +76
property_entries_dup.part                    789     861     +72
pnp_register_port_resource                    98     167     +69
pnp_register_mem_resource                     98     167     +69
pnp_register_irq_resource                    146     206     +60
pnp_register_dma_resource                     61     121     +60
pcpu_get_vm_areas                           3086    3139     +53
sr_probe                                    1360    1409     +49
fl_create                                    675     724     +49
ext4_expand_extra_isize_ea                  2218    2265     +47
fib6_info_alloc                               60     105     +45
init_worker_pool                             247     291     +44
ctnetlink_alloc_filter.part                    -      43     +43
alloc_workqueue                             1229    1270     +41
...
Total: Before=14789209, After=14790370, chg +0.01%

It increased by 1161 bytes.

I tried to modify it many times until the special judgment of 96, 192
was placed at the beginning of the function, and the bloat-o-meter
showed a reduction of 241 bytes.

$ ./scripts/bloat-o-meter vmlinux-base vmlinux-patch_1-5
add/remove: 1/2 grow/shrink: 6/64 up/down: 872/-1113 (-241)
Total: Before=14789209, After=14788968, chg -0.00%

Therefore, the implementation of kmalloc_index() in the patch is
intentional.

In addition, the above data was generated from my laptop. But with the
same code and kernel configuration, it shows different test results on
my PC (probably due to different versions of GCC).

$ ./scripts/bloat-o-meter vmlinux-base vmlinux-patch_1-5
add/remove: 1/2 grow/shrink: 6/70 up/down: 856/-1062 (-206)

$ ./scripts/bloat-o-meter vmlinux-base vmlinux-patch_1-5-order_96_192
add/remove: 1/2 grow/shrink: 12/71 up/down: 989/-1165 (-176)

Sorting 96 and 192 by size in a timely manner makes the result worse,
but at least the sum is still negative.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux