Re: [PATCH] scsi, fcoe, libfc: drop scsi host_lock use from fc_queuecommand

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2010-09-12 at 18:37 +0200, Bart Van Assche wrote:
> On Sat, Sep 4, 2010 at 12:27 AM, Vasu Dev <vasu.dev@xxxxxxxxx> wrote:
> >
> > Currently fc_queuecommand drops this lock very early on and then re-acquires
> > this lock just before return, this re-acquired lock gets dropped immediately
> > by its fast path caller scsi_dispatch_cmd, this re-acquire and then immediate
> > drop on each IO hits performance especially with small size IOs on multi-core
> > systems, this hit is significant about 25% with 512 bytes size IOs.
> >
> > This lock is not needed in fc_queuecommand and calling fc_queuecommand
> > without this lock held removes above described performance hit.
> >
> > So this patch adds unlocked_qcmds flag to drop host_lock before
> > calling only fc_queuecommand and removes re-acquire &  then drop
> > for  each IO. Added flag, drops this lock only if LLD wants such
> > as fcoe.ko does here for fc_queuecommand. This change won't affect
> > any existing LLD since added unlocked_qcmds flag will be zero
> > in those cases and their host lock uses would effectively remain
> > same after this patch.
> [ ... ]
> 
> Hello Vasu,
> 
> While I don't doubt that your patch improves performance, there might
> be more bottlenecks in the Linux storage stack than the SCSI host
> lock. 

Yes there are, upper blocks(scsi/block layer) of fcoe stack were still
highest hit in my perf profiling even after the unlocked_qcmds, as I
mentioned this unlocked_qcmds change helped with small IOs most and that
too at 10G host link and same was verified by our validation but
improving on more upper blocks of fcoe stack bottleneck should give
better perf for all IO sizes at various link speeds for various
transports.
 
> After having converted the SRP initiator (ib_srp) to use per-LUN
> locking instead of per-host locking and after having enabled

Per-lun locking looks good idea and should mitigate host lock
bottlenecks while lun is not simultaneously used by all host cores, may
be some use case would prevent that by limiting lun to VM on specific
core. Is this change already upstream or queued for upstream ? Is that
change going to eliminate host lock on fast path completely ?

> unlocked_qcmds, I noticed that the largest source of contention was in
> the block layer. There was ten times more contention on
> request_queue.queue_lock than on any other spinlock. As you can see
> below, the four most contended locks in the test I ran were:
> * request_queue.queue_lock, needed by __make_request() in blk-core.c.
> * srp_target_port.lock, the per-LUN lock in ib_srp.c.
> * The SCSI host lock.
> * An AIO lock.
> 
> The test I ran used libaio and the NOOP scheduler. Block merging was
> disabled via sysfs.
> 
> >From /proc/lock_stat:
> 
> lock_stat version 0.3
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                               class name    con-bounces    contentions
>   waittime-min   waittime-max waittime-total    acq-bounces
> acquisitions   holdtime-min   holdtime-max holdtime-total
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>               &(&q->__queue_lock)->rlock:      11093047       11093901
>           0.22           9.37    10163231.15       41697489
> 74851829           0.00          21.14    60656375.82
>               --------------------------
>               &(&q->__queue_lock)->rlock        2093562
> [<ffffffff811af406>] __make_request+0x56/0x500
>               &(&q->__queue_lock)->rlock              2
> [<ffffffff811bcd70>] cfq_get_queue+0xf0/0x2a0
>               &(&q->__queue_lock)->rlock              1
> [<ffffffff811bd3e4>] cfq_set_request+0x414/0x500
>               &(&q->__queue_lock)->rlock             36
> [<ffffffff811bd049>] cfq_set_request+0x79/0x500
>               --------------------------
>               &(&q->__queue_lock)->rlock             17
> [<ffffffff811bd049>] cfq_set_request+0x79/0x500
>               &(&q->__queue_lock)->rlock        1007349
> [<ffffffff811af406>] __make_request+0x56/0x500
>               &(&q->__queue_lock)->rlock        3036167
> [<ffffffff811af45f>] __make_request+0xaf/0x500
>               &(&q->__queue_lock)->rlock              1
> [<ffffffff811bcd70>] cfq_get_queue+0xf0/0x2a0
> 
> ...............................................................................................................................................................................................
> 
>                  &(&target->lock)->rlock:       1098815        1098945
>           0.21           3.67      317746.64       16107020
> 48401787           0.00          14.65    11997072.82
>                  -----------------------
>                  &(&target->lock)->rlock         220238
> [<ffffffffa04e1307>] __srp_get_tx_iu+0x47/0xd0 [ib_srp]
>                  &(&target->lock)->rlock         366255
> [<ffffffffa04e2ee0>] srp_queuecommand+0xb0/0xad0 [ib_srp]
>                  &(&target->lock)->rlock         297993
> [<ffffffffa04e25b6>] srp_remove_req+0x46/0x80 [ib_srp]
>                  &(&target->lock)->rlock         114850
> [<ffffffffa04e2ae2>] srp_handle_recv+0x182/0x420 [ib_srp]
>                  -----------------------
>                  &(&target->lock)->rlock         277148
> [<ffffffffa04e2ee0>] srp_queuecommand+0xb0/0xad0 [ib_srp]
>                  &(&target->lock)->rlock         142360
> [<ffffffffa04e2ae2>] srp_handle_recv+0x182/0x420 [ib_srp]
>                  &(&target->lock)->rlock         245220
> [<ffffffffa04e1307>] __srp_get_tx_iu+0x47/0xd0 [ib_srp]
>                  &(&target->lock)->rlock         266847
> [<ffffffffa04e25b6>] srp_remove_req+0x46/0x80 [ib_srp]
> 
> ...............................................................................................................................................................................................
> 
>               &(shost->host_lock)->rlock:        989776         989903
>           0.21           5.78      271707.74       20089204
> 38990409           0.00           7.07     9908160.13
>               --------------------------
>               &(shost->host_lock)->rlock         123198
> [<ffffffffa0037583>] scsi_request_fn+0x163/0x4d0 [scsi_mod]
>               &(shost->host_lock)->rlock         235047
> [<ffffffffa003036c>] scsi_dispatch_cmd+0x11c/0x3e0 [scsi_mod]
>               &(shost->host_lock)->rlock         389901
> [<ffffffffa00383cd>] scsi_device_unbusy+0x3d/0xd0 [scsi_mod]
>               &(shost->host_lock)->rlock         241762
> [<ffffffffa0036824>] scsi_run_queue+0x54/0x3d0 [scsi_mod]
>               --------------------------
>               &(shost->host_lock)->rlock         107827
> [<ffffffffa0037583>] scsi_request_fn+0x163/0x4d0 [scsi_mod]
>               &(shost->host_lock)->rlock          95756
> [<ffffffffa003036c>] scsi_dispatch_cmd+0x11c/0x3e0 [scsi_mod]
>               &(shost->host_lock)->rlock         316122
> [<ffffffffa0036824>] scsi_run_queue+0x54/0x3d0 [scsi_mod]
>               &(shost->host_lock)->rlock         470203
> [<ffffffffa00383cd>] scsi_device_unbusy+0x3d/0xd0 [scsi_mod]
> 
> ...............................................................................................................................................................................................
> 
>                 &(&ctx->ctx_lock)->rlock:        399855         399868
>           0.22           4.22      143530.86        7351017
> 50104676           0.00           8.92    14706301.05
>                 ------------------------
>                 &(&ctx->ctx_lock)->rlock         194753
> [<ffffffff81158805>] aio_complete+0x45/0x260
>                 &(&ctx->ctx_lock)->rlock          67287
> [<ffffffff81159434>] do_io_submit+0x214/0x7b0
>                 &(&ctx->ctx_lock)->rlock          56529
> [<ffffffff81157da5>] __aio_get_req+0x95/0x1a0
>                 &(&ctx->ctx_lock)->rlock          54461
> [<ffffffff8115785d>] aio_put_req+0x2d/0x60
>                 ------------------------
>                 &(&ctx->ctx_lock)->rlock         205740
> [<ffffffff81158805>] aio_complete+0x45/0x260
>                 &(&ctx->ctx_lock)->rlock         125933
> [<ffffffff8115785d>] aio_put_req+0x2d/0x60
>                 &(&ctx->ctx_lock)->rlock          37042
> [<ffffffff81157da5>] __aio_get_req+0x95/0x1a0
>                 &(&ctx->ctx_lock)->rlock          29197
> [<ffffffff81159434>] do_io_submit+0x214/0x7b0
> 
> ...............................................................................................................................................................................................

May be block layer experts working on improving these code paths and I
don't know much about that part and any case this all goes beyond what I
was trying to accomplish with simple unlocked_qcmds change.

	Thanks
	Vasu

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux