Uladzislau Rezki <urezki@xxxxxxxxx> 于2019年7月1日周一 下午6:11写道: > > On Mon, Jul 01, 2019 at 11:20:37AM +0200, Michal Hocko wrote: > > On Sun 30-06-19 15:56:45, Pengfei Li wrote: > > > Hi, > > > > > > This series of patches is to reduce the size of struct vmap_area. > > > > > > Since the members of struct vmap_area are not being used at the same time, > > > it is possible to reduce its size by placing several members that are not > > > used at the same time in a union. > > > > > > The first 4 patches did some preparatory work for this and improved > > > readability. > > > > > > The fifth patch is the main patch, it did the work of rewriting vmap_area. > > > > > > More details can be obtained from the commit message. > > > > None of the commit messages talk about the motivation. Why do we want to > > add quite some code to achieve this? How much do we save? This all > > should be a part of the cover letter. > > > > > Thanks, > > > > > > Pengfei > > > > > > Pengfei Li (5): > > > mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area() > > > mm/vmalloc.c: Introduce a wrapper function of > > > insert_vmap_area_augment() > > > mm/vmalloc.c: Rename function __find_vmap_area() for readability > > > mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability > > > mm/vmalloc.c: Rewrite struct vmap_area to reduce its size > > > > > > include/linux/vmalloc.h | 28 +++++--- > > > mm/vmalloc.c | 144 +++++++++++++++++++++++++++------------- > > > 2 files changed, 117 insertions(+), 55 deletions(-) > > > > > > -- > > > 2.21.0 > > > > Pengfei Li (5): > > > mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area() > > > mm/vmalloc.c: Introduce a wrapper function of > > > insert_vmap_area_augment() > > > mm/vmalloc.c: Rename function __find_vmap_area() for readability > > > mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability > > > mm/vmalloc.c: Rewrite struct vmap_area to reduce its size > Fitting vmap_area to 1 cacheline boundary makes sense to me. I was thinking about > that and i have patches in my pipeline to send out but implementation is different. > > I had a look at all 5 patches. What you are doing is reasonable to me, i mean when > it comes to the idea of reducing the size to L1 cache line. > Thank you for your review. > I have a concern about implementation and all logic around when we can use va_start > and when it is something else. It is not optimal at least to me, from performance point > of view and complexity. All hot paths and tree traversal are affected by that. > > For example running the vmalloc test driver against this series shows the following > delta: > > <5.2.0-rc6+> > Summary: fix_size_alloc_test passed: loops: 1000000 avg: 969370 usec > Summary: full_fit_alloc_test passed: loops: 1000000 avg: 989619 usec > Summary: long_busy_list_alloc_test loops: 1000000 avg: 12895813 usec > <5.2.0-rc6+> > > <this series> > Summary: fix_size_alloc_test passed: loops: 1000000 avg: 1098372 usec > Summary: full_fit_alloc_test passed: loops: 1000000 avg: 1167260 usec > Summary: long_busy_list_alloc_test passed: loops: 1000000 avg: 12934286 usec > <this series> > > For example, the degrade in second test is ~15%. > > -- > Vlad Rezki Hi, Vlad I think the reason for the performance degradation is that the value of va_start is obtained by va->vm->addr. And since the vmap area in the BUSY tree is always page-aligned, there is no reason for _va_vmlid to override va_start, just let the va->flags use the bits that lower than PAGE_OFFSET. I will use this implementation in the next version and show almost no performance penalty in my local tests. I will send the next version soon. Thank you for taking your time for the review. Best regards, Pengfei