Re: [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root rb-tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 
> On 2024/1/5 18:50, Uladzislau Rezki wrote:
> 
> > Hello, Wen Gu.
> > 
> > > 
> > > Hi Uladzislau Rezki,
> > > 
> 
> <...>
> 
> > > Fortunately, thank you for this patch set, the global vmap_area_lock was
> > > removed and per node lock vn->busy.lock is introduced. it is really helpful:
> > > 
> > > In 48 CPUs qemu environment, the Requests/s increased by 5 times:
> > > - nginx
> > > - wrk -c 1000 -t 96 -d 30 http://127.0.0.1:80
> > > 
> > >                  vzalloced shmem      vzalloced shmem(with this patch set)
> > > Requests/sec          113536.56            583729.93
> > > 
> > > 
> > Thank you for the confirmation that your workload is improved. The "nginx"
> > is 5 times better!
> > 
> 
> Yes, thank you very much for the improvement!
> 
> > > But it also has some overhead, compared to using kzalloced shared memory
> > > or unsetting CONFIG_HARDENED_USERCOPY, which won't involve finding vmap area:
> > > 
> > >                  kzalloced shmem      vzalloced shmem(unset CONFIG_HARDENED_USERCOPY)
> > > Requests/sec          831950.39            805164.78
> > > 
> > > 
> > The CONFIG_HARDENED_USERCOPY prevents coping "wrong" memory regions. That is
> > why if it is a vmalloced memory it wants to make sure it is really true,
> > if not user-copy is aborted.
> > 
> > So there is an extra work that involves finding a VA associated with an address.
> > 
> 
> Yes, and lock contention in finding VA is likely to be a performance bottleneck,
> which is mitigated a lot by your work.
> 
> > > So, as a newbie in Linux-mm, I would like to ask for some suggestions:
> > > 
> > > Is it possible to further eliminate the overhead caused by lock contention
> > > in find_vmap_area() in this scenario (maybe this is asking too much), or the
> > > only way out is not setting CONFIG_HARDENED_USERCOPY or not using vzalloced
> > > buffer in the situation where cocurrent kernel-userspace-copy happens?
> > > 
> > Could you please try below patch, if it improves this series further?
> > Just in case:
> > 
> 
> Thank you! I tried the patch, and it seems that the wait for rwlock_t
> also exists, as much as using spinlock_t. (The flamegraph is attached.
> Not sure why the read_lock waits so long, given that there is no frequent
> write_lock competition)
> 
>                vzalloced shmem(spinlock_t)   vzalloced shmem(rwlock_t)
> Requests/sec         583729.93                     460007.44
> 
> So I guess the overhead in finding vmap area is inevitable here and the
> original spin_lock is fine in this series.
> 
I have also noticed a erformance difference between rwlock and spinlock. 
So, yes. This is what we need to do extra if CONFIG_HARDENED_USERCOPY is
set, i.e. find a VA.

--
Uladzislau Rezki




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux