Re: [PATCH] block: Call .initialize_rq_fn() also for filesystem requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2017-08-28 at 10:10 +0200, Christoph Hellwig wrote:
> I still disagree that we should have an indirect function call like this
> in the fast path.
> 
> All that can be done by clearing or setting a flag on the first call to
> ->queue_rq or ->queuecommand instead.  In NVMe we use RQF_DONTPREP for
> that, but for SCSI we probably can't use that given that it has more
> meaning for the old request path.  But how about just adding a new
> RQD_DRV_INITIALIZED or similar flag?

Hello Christoph,

Sorry but I'm not enthusiast about the proposal to introduce a
RQD_DRV_INITIALIZED or similar flag. That approach involves an annoying
behavior difference, namely that .initialize_rq_fn() would be called from
inside blk_get_request() for pass-through requests and from inside the prep
function for filesystem requests. Another disadvantage of that approach is
that the block layer core never clears request.atomic_flags completely but
only sets and clears individual flags. The SCSI core would have to follow
that model and hence code for clearing RQD_DRV_INITIALIZED would have to be
added to all request completion paths in the SCSI core.

Have you noticed that Ming Lei's patch series introduces several new atomic
operations in the hot path? I'm referring here to the BLK_MQ_S_DISPATCH_BUSY
manipulations. Have you noticed that for SCSI drivers these patches introduce
an overhead between 0.1 and 1.0 microseconds per I/O request in the hot path?
I have derived these numbers from the random write SRP performance numbers
as follows: 1/142460 - 1/142990 = 2.6 microseconds. That number has to be
multiplied with the number of I/O requests processed in parallel. The number
of jobs in Ming Lei's test was 64 but that's probably way higher than the
actual I/O parallelism.

Have you noticed that my patch did not add any atomic instructions to the hot
path but only a read of a function pointer that should already be cache hot?
As you know modern CPUs are good at predicting branches. Are you sure that my
patch will have a measurable performance impact?

Bart.




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux