Re: [PATCH 01/84] scsi: core: Use a member variable to track the SCSI command submitter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/24/21 02:36, Benjamin Block wrote:
On Fri, Sep 17, 2021 at 05:04:44PM -0700, Bart Van Assche wrote:
Conditional statements are faster than indirect calls. Use a member variable
to track the SCSI command submitter such that later patches can call
scsi_done(scmd) instead of scmd->scsi_done(scmd).

The asymmetric behavior that scsi_send_eh_cmnd() sets the submission
context to the SCSI error handler and that it does not restore the
submission context to the SCSI core is retained.

Cc: Hannes Reinecke <hare@xxxxxxxx>
Cc: Ming Lei <ming.lei@xxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
---
  drivers/scsi/scsi_error.c | 18 +++++++-----------
  drivers/scsi/scsi_lib.c   |  9 +++++++++
  drivers/scsi/scsi_priv.h  |  1 +
  include/scsi/scsi_cmnd.h  |  7 +++++++
  4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 572673873ddf..ba6d748a0246 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1577,6 +1577,15 @@ static blk_status_t scsi_prepare_cmd(struct request *req)
static void scsi_mq_done(struct scsi_cmnd *cmd)
  {
+	switch (cmd->submitter) {
+	case BLOCK_LAYER:
+		break;
+	case SCSI_ERROR_HANDLER:
+		return scsi_eh_done(cmd);
+	case SCSI_RESET_IOCTL:
+		return;
+	}
+

Hmm, I'm confused, you replace one kind of branch with different one. Why
would that increase IOPS by 5%?

Maybe its because the new `submitter` field in `struct scsi_cmnd` is now
on a hot cache line, whereas `*scsi_done` is not?

Hi Benjamin,

To be honest, the 5% improvement is more than I had expected. This is what I
know about indirect function calls vs. branches:
- The target of an indirect branch is predicted by the indirect branch
  predictor. For direct branches the Branch Target Buffer (BTB) is used.
- The performance of indirect calls is negatively affected by security
  mitigations (CONFIG_RETPOLINE) but not the performance of direct branches
  My measurement was run with CONFIG_RETPOLINE off. I expect a larger
  difference with CONFIG_RETPOLINE enabled.

Maybe I triggered inefficient behavior of the indirect branch predictor with
the workload I ran.

Bart.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux