Re: [PATCH 4/5] scsi_io_completion: conditional hints on fastpath

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2018-03-09 10:13 AM, Johannes Thumshirn wrote:
On Tue, Mar 06, 2018 at 10:21:33PM -0500, Douglas Gilbert wrote:
ChangeLog:
   - add likely() and unlikely() hints to conditionals on or near the
     fastpath
   - fix a comment in scsi_io_completion_action()

Do you have any numbers what this patch buys us performance wise?

How does 7 nanoseconds sound?

Measuring the time taken by scsi_io_completion() gives figures that
jump around seemingly randomly between 1.2 and 2.0 microseconds
(when the system is under heavy load). Turns out that almost all of
that is caused by the call to scsi_end_request() (and the functions
it calls). So the timing I was after was from the call into
scsi_io_completion() to the point on the fastpath just before
scsi_end_request() is called. ftrace function-graph doesn't even show
this time (perhaps it has a lower threshold value).

So I wrote my own timer, starting and stopping at those points, reading
the hr timer with ktime and only timing on one CPU (to avoid atomics
an mutexes). The input was a fio script used previously for testing
scsi_mq.

The "before" timing was on a kernel build with MKP's tree which now has
my "[PATCH v4] Make SCSI Status CONDITION MET equivalent to GOOD"
patch applied, but not this patchset. Also some of the time being measured
is from the measuring code itself :-)

Here is the "before", averaged over 100 millisecond windows:
  hrtimer: interrupt took 16045 ns
  print_timings: cum_time=100000156 ns, count=630299, avg=158 ns
  print_timings: cum_time=100000126 ns, count=623939, avg=160 ns
  print_timings: cum_time=100000020 ns, count=625046, avg=159 ns
  print_timings: cum_time=100000098 ns, count=624375, avg=160 ns
  print_timings: cum_time=100000060 ns, count=619438, avg=161 ns
  print_timings: cum_time=100000132 ns, count=616604, avg=162 ns
  print_timings: cum_time=100000050 ns, count=617839, avg=161 ns
  print_timings: cum_time=100000079 ns, count=623943, avg=160 ns
  print_timings: cum_time=100000128 ns, count=617545, avg=161 ns
  print_timings: cum_time=100000083 ns, count=617902, avg=161 ns

Here is the "after":
  print_timings: cum_time=100000022 ns, count=628977, avg=158 ns
  print_timings: cum_time=100000021 ns, count=642545, avg=155 ns
  print_timings: cum_time=100000042 ns, count=647620, avg=154 ns
  print_timings: cum_time=100000115 ns, count=649420, avg=153 ns
  print_timings: cum_time=100000001 ns, count=646527, avg=154 ns
  print_timings: cum_time=100000103 ns, count=648608, avg=154 ns
  print_timings: cum_time=100000074 ns, count=650341, avg=153 ns
  print_timings: cum_time=100000115 ns, count=652359, avg=153 ns
  print_timings: cum_time=100000031 ns, count=636437, avg=157 ns
  print_timings: cum_time=100000096 ns, count=632879, avg=158 ns


Attached is some ftrace function-graph output of scsi_io_completion()
which is gzipped for brevity.

Looked at in lines of C code, in the window I'm interested in the
"before" code does:
   - 4 local variable initializations
   - zeros two locals, sets false in another
   - wait_for = (cmd->allowed + 1) * req->timeout;  (local initialization)
   - 5 failed "if"s (one is "else if") with no hints
   - fails to trigger on BUG_ON
   - doesn't print a SCSI_LOG_HLCOMPLETE
   - before it gets to the fastpath exit:
        if (!(blk_rq_bytes(req) == 0 && error) &&
            !scsi_end_request(req, error, good_bytes, 0))
                return;

The "after" code does:
   - 4 local variable initializations
   - 2 failed "if"s with hints
   - one failed "if" with hint in place of BUG_ON
   - doesn't print a SCSI_LOG_HLCOMPLETE
   - before it gets to the fastpath exit which has hints:
        if (likely(blk_rq_bytes(req) > 0 || blk_stat == BLK_STS_OK)) {
                if (likely(!scsi_end_request(req, blk_stat, good_bytes, 0)))
                        return; /* no bytes remaining */
        }

So the difference measured on a Lenovo X270 (i5-7200U) amounts to
around 7 nanoseconds.

Doug Gilbert

Attachment: scsi_io_completion_ftrace.txt.gz
Description: application/gzip


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux