On Fri, Sep 24, 2021 at 08:38:42PM -0700, Bart Van Assche wrote: > On 9/24/21 02:36, Benjamin Block wrote: > > On Fri, Sep 17, 2021 at 05:04:44PM -0700, Bart Van Assche wrote: > > > Conditional statements are faster than indirect calls. Use a member variable > > > to track the SCSI command submitter such that later patches can call > > > scsi_done(scmd) instead of scmd->scsi_done(scmd). > > > > > > The asymmetric behavior that scsi_send_eh_cmnd() sets the submission > > > context to the SCSI error handler and that it does not restore the > > > submission context to the SCSI core is retained. > > > > > > Cc: Hannes Reinecke <hare@xxxxxxxx> > > > Cc: Ming Lei <ming.lei@xxxxxxxxxx> > > > Cc: Christoph Hellwig <hch@xxxxxx> > > > Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> > > > --- > > > drivers/scsi/scsi_error.c | 18 +++++++----------- > > > drivers/scsi/scsi_lib.c | 9 +++++++++ > > > drivers/scsi/scsi_priv.h | 1 + > > > include/scsi/scsi_cmnd.h | 7 +++++++ > > > 4 files changed, 24 insertions(+), 11 deletions(-) > > > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > > index 572673873ddf..ba6d748a0246 100644 > > > --- a/drivers/scsi/scsi_lib.c > > > +++ b/drivers/scsi/scsi_lib.c > > > @@ -1577,6 +1577,15 @@ static blk_status_t scsi_prepare_cmd(struct request *req) > > > static void scsi_mq_done(struct scsi_cmnd *cmd) > > > { > > > + switch (cmd->submitter) { > > > + case BLOCK_LAYER: > > > + break; > > > + case SCSI_ERROR_HANDLER: > > > + return scsi_eh_done(cmd); > > > + case SCSI_RESET_IOCTL: > > > + return; > > > + } > > > + > > > > Hmm, I'm confused, you replace one kind of branch with different one. Why > > would that increase IOPS by 5%? > > > > Maybe its because the new `submitter` field in `struct scsi_cmnd` is now > > on a hot cache line, whereas `*scsi_done` is not? > > Hi Benjamin, > > To be honest, the 5% improvement is more than I had expected. This is what I > know about indirect function calls vs. branches: > - The target of an indirect branch is predicted by the indirect branch > predictor. For direct branches the Branch Target Buffer (BTB) is used. > - The performance of indirect calls is negatively affected by security > mitigations (CONFIG_RETPOLINE) but not the performance of direct branches > My measurement was run with CONFIG_RETPOLINE off. I expect a larger > difference with CONFIG_RETPOLINE enabled. Ah ok, yeah, that sounds reasonable. Thanks. -- Best Regards, Benjamin Block / Linux on IBM Z Kernel Development / IBM Systems IBM Deutschland Research & Development GmbH / https://www.ibm.com/privacy Vorsitz. AufsR.: Gregor Pillen / Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294