On 2017-07-17 18:56, Keith Busch wrote:
On Mon, Jul 17, 2017 at 06:46:11PM -0400, Sinan Kaya wrote:
Hi Keith,
On 7/17/2017 6:45 PM, Keith Busch wrote:
> On Mon, Jul 17, 2017 at 06:36:23PM -0400, Sinan Kaya wrote:
>> Code is moving the completion queue doorbell after processing all completed
>> events and sending callbacks to the block layer on each iteration.
>>
>> This is causing a performance drop when a lot of jobs are queued towards
>> the HW. Move the completion queue doorbell on each loop instead and allow new
>> jobs to be queued by the HW.
>
> That doesn't make sense. Aggregating doorbell writes should be much more
> efficient for high depth workloads.
>
Problem is that code is throttling the HW as HW cannot queue more
completions until
SW get a chance to clear it.
As an example:
for each in N
(
blk_layer()
)
ring door bell
HW cannot queue new job until N x blk_layer operations are processed
and queue
element ownership is passed to the HW after the loop. HW is just
sitting idle
there if no queue entries are available.
If no completion queue entries are available, then there can't possibly
be any submission queue entries for the HW to work on either.
Maybe, I need to understand the design better. I was curious why
completion and submission queues were protected by a single lock causing
lock contention.
I was treating each queue independently. I have seen slightly better
performance by an early doorbell. That was my explanation.
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html