On Mon, Jul 01, 2019 at 11:20:37AM +0200, Michal Hocko wrote: > On Sun 30-06-19 15:56:45, Pengfei Li wrote: > > Hi, > > > > This series of patches is to reduce the size of struct vmap_area. > > > > Since the members of struct vmap_area are not being used at the same time, > > it is possible to reduce its size by placing several members that are not > > used at the same time in a union. > > > > The first 4 patches did some preparatory work for this and improved > > readability. > > > > The fifth patch is the main patch, it did the work of rewriting vmap_area. > > > > More details can be obtained from the commit message. > > None of the commit messages talk about the motivation. Why do we want to > add quite some code to achieve this? How much do we save? This all > should be a part of the cover letter. > > > Thanks, > > > > Pengfei > > > > Pengfei Li (5): > > mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area() > > mm/vmalloc.c: Introduce a wrapper function of > > insert_vmap_area_augment() > > mm/vmalloc.c: Rename function __find_vmap_area() for readability > > mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability > > mm/vmalloc.c: Rewrite struct vmap_area to reduce its size > > > > include/linux/vmalloc.h | 28 +++++--- > > mm/vmalloc.c | 144 +++++++++++++++++++++++++++------------- > > 2 files changed, 117 insertions(+), 55 deletions(-) > > > > -- > > 2.21.0 > > Pengfei Li (5): > > mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area() > > mm/vmalloc.c: Introduce a wrapper function of > > insert_vmap_area_augment() > > mm/vmalloc.c: Rename function __find_vmap_area() for readability > > mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability > > mm/vmalloc.c: Rewrite struct vmap_area to reduce its size Fitting vmap_area to 1 cacheline boundary makes sense to me. I was thinking about that and i have patches in my pipeline to send out but implementation is different. I had a look at all 5 patches. What you are doing is reasonable to me, i mean when it comes to the idea of reducing the size to L1 cache line. I have a concern about implementation and all logic around when we can use va_start and when it is something else. It is not optimal at least to me, from performance point of view and complexity. All hot paths and tree traversal are affected by that. For example running the vmalloc test driver against this series shows the following delta: <5.2.0-rc6+> Summary: fix_size_alloc_test passed: loops: 1000000 avg: 969370 usec Summary: full_fit_alloc_test passed: loops: 1000000 avg: 989619 usec Summary: long_busy_list_alloc_test loops: 1000000 avg: 12895813 usec <5.2.0-rc6+> <this series> Summary: fix_size_alloc_test passed: loops: 1000000 avg: 1098372 usec Summary: full_fit_alloc_test passed: loops: 1000000 avg: 1167260 usec Summary: long_busy_list_alloc_test passed: loops: 1000000 avg: 12934286 usec <this series> For example, the degrade in second test is ~15%. -- Vlad Rezki