On 9/24/21 02:36, Benjamin Block wrote:
On Fri, Sep 17, 2021 at 05:04:44PM -0700, Bart Van Assche wrote:
Conditional statements are faster than indirect calls. Use a member variable
to track the SCSI command submitter such that later patches can call
scsi_done(scmd) instead of scmd->scsi_done(scmd).
The asymmetric behavior that scsi_send_eh_cmnd() sets the submission
context to the SCSI error handler and that it does not restore the
submission context to the SCSI core is retained.
Cc: Hannes Reinecke <hare@xxxxxxxx>
Cc: Ming Lei <ming.lei@xxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
---
drivers/scsi/scsi_error.c | 18 +++++++-----------
drivers/scsi/scsi_lib.c | 9 +++++++++
drivers/scsi/scsi_priv.h | 1 +
include/scsi/scsi_cmnd.h | 7 +++++++
4 files changed, 24 insertions(+), 11 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 572673873ddf..ba6d748a0246 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1577,6 +1577,15 @@ static blk_status_t scsi_prepare_cmd(struct request *req)
static void scsi_mq_done(struct scsi_cmnd *cmd)
{
+ switch (cmd->submitter) {
+ case BLOCK_LAYER:
+ break;
+ case SCSI_ERROR_HANDLER:
+ return scsi_eh_done(cmd);
+ case SCSI_RESET_IOCTL:
+ return;
+ }
+
Hmm, I'm confused, you replace one kind of branch with different one. Why
would that increase IOPS by 5%?
Maybe its because the new `submitter` field in `struct scsi_cmnd` is now
on a hot cache line, whereas `*scsi_done` is not?
Hi Benjamin,
To be honest, the 5% improvement is more than I had expected. This is what I
know about indirect function calls vs. branches:
- The target of an indirect branch is predicted by the indirect branch
predictor. For direct branches the Branch Target Buffer (BTB) is used.
- The performance of indirect calls is negatively affected by security
mitigations (CONFIG_RETPOLINE) but not the performance of direct branches
My measurement was run with CONFIG_RETPOLINE off. I expect a larger
difference with CONFIG_RETPOLINE enabled.
Maybe I triggered inefficient behavior of the indirect branch predictor with
the workload I ran.
Bart.