On Mon, Jan 08, 2024 at 03:45:18PM +0800, Wen Gu wrote: > > > On 2024/1/7 14:59, Hillf Danton wrote: > > On Sat, 6 Jan 2024 17:36:23 +0100 Uladzislau Rezki <urezki@xxxxxxxxx> > > > > > > > > Thank you! I tried the patch, and it seems that the wait for rwlock_t > > > > also exists, as much as using spinlock_t. (The flamegraph is attached. > > > > Not sure why the read_lock waits so long, given that there is no frequent > > > > write_lock competition) > > > > > > > > vzalloced shmem(spinlock_t) vzalloced shmem(rwlock_t) > > > > Requests/sec 583729.93 460007.44 > > > > > > > > So I guess the overhead in finding vmap area is inevitable here and the > > > > original spin_lock is fine in this series. > > > > > > > I have also noticed a erformance difference between rwlock and spinlock. > > > So, yes. This is what we need to do extra if CONFIG_HARDENED_USERCOPY is > > > set, i.e. find a VA. > > > > See if read bias helps to understand the gap between spinlock and rwlock. > > > > --- x/kernel/locking/qrwlock.c > > +++ y/kernel/locking/qrwlock.c > > @@ -23,7 +23,7 @@ void __lockfunc queued_read_lock_slowpat > > /* > > * Readers come here when they cannot get the lock without waiting > > */ > > - if (unlikely(in_interrupt())) { > > + if (1) { > > /* > > * Readers in interrupt context will get the lock immediately > > * if the writer is just waiting (not holding the lock yet), > > Thank you for the idea! Hillf. > > IIUC, the change makes read operations more likely to acquire lock and > modified fairness to favor reading. > > The test in my scenario shows: > > vzalloced shmem with spinlock_t rwlock_t rwlock_t(with above change) > Requests/sec 564961.29 442534.33 439733.31 > > In addition to read bias, there seems to be other factors that cause the > gap, but I haven't figured it out yet.. > <snip> urezki@pc638:~$ cat locktorture.sh #!/bin/sh # available lock types: spin_lock, rw_lock torture_type=$1 test_time=$2 echo "Start..." modprobe locktorture $torture_type nreaders_stress=0 > /dev/null 2>&1 sleep $test_time rmmod locktorture > /dev/null 2>&1 echo "Done." urezki@pc638:~$ <snip> Out: # sudo ./locktorture.sh rw_lock 30 [12107.327566] Writes: Total: 53304415 Max/Min: 1620715/3225 ??? Fail: 0 [12107.327898] spin_lock-torture: lock_torture_stats is stopping [12107.328192] Writes: Total: 53304415 Max/Min: 1620715/3225 ??? Fail: 0 [12107.328368] spin_lock-torture:--- End of test: SUCCESS: acq_writer_lim=0 bind_readers=0-63 bind_writers=0-63 call_rcu_chains=0 long_hold=100 nested_locks=0 nreaders_stress=0 nwriters_stress=128 onoff_holdoff=0 onoff_interval=0 rt_boost=2 rt_boost_factor=50 shuffle_interval=3 shutdown_secs=0 stat_interval=60 stutter=5 verbose=1 writer_fifo=0 # sudo ./locktorture.sh spin_lock 30 [12051.968992] Writes: Total: 47843400 Max/Min: 1335320/5942 ??? Fail: 0 [12051.969236] spin_lock-torture: lock_torture_stats is stopping [12051.969507] Writes: Total: 47843400 Max/Min: 1335320/5942 ??? Fail: 0 [12051.969813] spin_lock-torture:--- End of test: SUCCESS: acq_writer_lim=0 bind_readers=0-63 bind_writers=0-63 call_rcu_chains=0 long_hold=100 nested_locks=0 nreaders_stress=0 nwriters_stress=128 onoff_holdoff=0 onoff_interval=0 rt_boost=2 rt_boost_factor=50 shuffle_interval=3 shutdown_secs=0 stat_interval=60 stutter=5 verbose=1 writer_fifo=0 I do not see a big difference between spin_lock and rw_lock. In fact the locktorture.ko test shows that a spin_lock is slightly worse but it might be something that i could miss or is not accurate enough in this test. When it comes to vmap test-suite and the difference between rw_lock and spin_lock. I have not spend much time figuring out the difference. >From the first glance it can be that a cache-miss rate is higher when switch to rw_lock or rw-lock requires more atomics. -- Uladzislau Rezki