On 2021-05-25 04:10, Bart Van Assche wrote:
On 5/24/21 1:36 AM, Can Guo wrote:
Current UFS IRQ handler is completely wrapped by host lock, and
because
ufshcd_send_command() is also protected by host lock, when IRQ handler
fires, not only the CPU running the IRQ handler cannot send new
requests,
the rest CPUs can neither. Move the host lock wrapping the IRQ handler
into
specific branches, i.e., ufshcd_uic_cmd_compl(),
ufshcd_check_errors(),
ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to
further
reduce occpuation of host lock in ufshcd_transfer_req_compl(), host
lock is
no longer required to call __ufshcd_transfer_req_compl(). As per test,
the
optimization can bring considerable gain to random read/write
performance.
Hi Can,
Using the host lock to serialize the completion path against the
submission path was a common practice 11 years ago, before the host
lock
push-down (see also
https://linux-scsi.vger.kernel.narkive.com/UEmGgwAc/rfc-patch-scsi-host-lock-push-down).
Modern SCSI LLDs should not use the SCSI host lock. Please consider
introducing one or more new synchronization objects in struct ufs_hba
and to use these instead of the SCSI host lock. That will save multiple
pointer dereferences in the hot path since hba->host->host_lock will
become hba->new_spin_lock.
An additional question is whether it is necessary for v3.0 UFS devices
to serialize the submission path against the completion path? Multiple
high-performance SCSI LLDs support hardware with separate submission
and
completion queues and hence do not need any serialization between the
submission and the completion path. I'm asking this because it is
likely
that sooner or later multiqueue support will be added in the UFS
specification. Benefiting from multiqueue support will require to
rework
locking in the UFS driver anyway.
Hi Bart,
Agree with all above, and what you ask is right what we are doing in the
3rd change - get rid of host lock on dispatch and completion paths.
I agree with using dedicated spin locks for dedicated purposes in UFS
driver,
e.g., clk gating has its own gating_lock and clk scaling has its own
scaling_lock.
But this specific series is only for improving performance. We will take
your
comments into consideration and address it in future.
Thanks,
Can Guo.
Thanks,
Bart.