On 10/4/21 10:06 PM, Adrian Hunter wrote:
On 04/10/2021 19:52, Bart Van Assche wrote:
I'm concerned that
this will prevent to fully benefit from multiqueue support. Has it been
You are talking about contention between ufshcd_queuecommand() running
simultaneously on 2 CPUs right? In that case, down_read() should be practically
atomic, so no contention unless a third process is waiting on down_write()
which never happens under normal circumstances.
Has it been
considered to eliminate the clk_scaling_lock and instead to use RCU to
serialize clock scaling against command processing? One possible approach is to
use blk_mq_freeze_queue() and blk_mq_unfreeze_queue() around the clock scaling
code. A disadvantage of using RCU is that waiting for an RCU grace period takes
some time - about 10 ms on my test setup. I have not yet verified what the
performance and time impact would be of using an expedited RCU grace period
instead of a regular RCU grace period.
It is probably worth measuring the performance of clk_scaling_lock first.
Upcoming UFS devices support several million IOPS. My experience, and that of
everyone else working with such storage devices is that every single atomic
operation in the hot path causes a measurable performance overhead.
down_read() is a synchronization operation and implementing synchronization
operations without using atomic loads or stores is not possible. This is why
I see clk_scaling_lock as a performance bottleneck.
Thanks,
Bart.