> > On 2024/1/5 18:50, Uladzislau Rezki wrote: > > > Hello, Wen Gu. > > > > > > > > Hi Uladzislau Rezki, > > > > > <...> > > > > Fortunately, thank you for this patch set, the global vmap_area_lock was > > > removed and per node lock vn->busy.lock is introduced. it is really helpful: > > > > > > In 48 CPUs qemu environment, the Requests/s increased by 5 times: > > > - nginx > > > - wrk -c 1000 -t 96 -d 30 http://127.0.0.1:80 > > > > > > vzalloced shmem vzalloced shmem(with this patch set) > > > Requests/sec 113536.56 583729.93 > > > > > > > > Thank you for the confirmation that your workload is improved. The "nginx" > > is 5 times better! > > > > Yes, thank you very much for the improvement! > > > > But it also has some overhead, compared to using kzalloced shared memory > > > or unsetting CONFIG_HARDENED_USERCOPY, which won't involve finding vmap area: > > > > > > kzalloced shmem vzalloced shmem(unset CONFIG_HARDENED_USERCOPY) > > > Requests/sec 831950.39 805164.78 > > > > > > > > The CONFIG_HARDENED_USERCOPY prevents coping "wrong" memory regions. That is > > why if it is a vmalloced memory it wants to make sure it is really true, > > if not user-copy is aborted. > > > > So there is an extra work that involves finding a VA associated with an address. > > > > Yes, and lock contention in finding VA is likely to be a performance bottleneck, > which is mitigated a lot by your work. > > > > So, as a newbie in Linux-mm, I would like to ask for some suggestions: > > > > > > Is it possible to further eliminate the overhead caused by lock contention > > > in find_vmap_area() in this scenario (maybe this is asking too much), or the > > > only way out is not setting CONFIG_HARDENED_USERCOPY or not using vzalloced > > > buffer in the situation where cocurrent kernel-userspace-copy happens? > > > > > Could you please try below patch, if it improves this series further? > > Just in case: > > > > Thank you! I tried the patch, and it seems that the wait for rwlock_t > also exists, as much as using spinlock_t. (The flamegraph is attached. > Not sure why the read_lock waits so long, given that there is no frequent > write_lock competition) > > vzalloced shmem(spinlock_t) vzalloced shmem(rwlock_t) > Requests/sec 583729.93 460007.44 > > So I guess the overhead in finding vmap area is inevitable here and the > original spin_lock is fine in this series. > I have also noticed a erformance difference between rwlock and spinlock. So, yes. This is what we need to do extra if CONFIG_HARDENED_USERCOPY is set, i.e. find a VA. -- Uladzislau Rezki