On 04/10/2021 19:52, Bart Van Assche wrote: > On 10/4/21 5:06 AM, Adrian Hunter wrote: >> To fit its new purpose as a more general purpose sleeping lock for the >> host. >> >> [ ... ] >> >> diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h >> index 9b1ef272fb3c..495e1c0afae3 100644 >> --- a/drivers/scsi/ufs/ufshcd.h >> +++ b/drivers/scsi/ufs/ufshcd.h >> @@ -897,7 +897,7 @@ struct ufs_hba { >> enum bkops_status urgent_bkops_lvl; >> bool is_urgent_bkops_lvl_checked; >> - struct rw_semaphore clk_scaling_lock; >> + struct rw_semaphore host_rw_sem; >> unsigned char desc_size[QUERY_DESC_IDN_MAX]; >> atomic_t scsi_block_reqs_cnt; > > Hi Adrian, Thanks for looking at this. > > It seems to me that this patch series goes in another direction than the > direction the JEDEC UFS committee is going into. The UFSHCI 4.0 specification > will support MCQ (Multi-Circular queue). We will benefit most from the v4.0 > MCQ support if there is no contention between the CPUs that submit UFS commands > to different queues. I think the intention of this patch series is to make a > single synchronization object protect all submission queues. The intention is to make the locking easier to understand. We need either to share access to the host (e.g. ufshcd_queuecommand) or provide for exclusive ownership (e.g. ufshcd_err_handler, PM, Shutdown). We should be able to do that with 1 rw_semaphore. > I'm concerned that > this will prevent to fully benefit from multiqueue support. Has it been You are talking about contention between ufshcd_queuecommand() running simultaneously on 2 CPUs right? In that case, down_read() should be practically atomic, so no contention unless a third process is waiting on down_write() which never happens under normal circumstances. > Has it been > considered to eliminate the clk_scaling_lock and instead to use RCU to > serialize clock scaling against command processing? One possible approach is to > use blk_mq_freeze_queue() and blk_mq_unfreeze_queue() around the clock scaling > code. A disadvantage of using RCU is that waiting for an RCU grace period takes > some time - about 10 ms on my test setup. I have not yet verified what the > performance and time impact would be of using an expedited RCU grace period > instead of a regular RCU grace period. It is probably worth measuring the performance of clk_scaling_lock first.