Re: [PATCH v3 2/7] block: Send requeued requests to the I/O scheduler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/22/23 16:45, Damien Le Moal wrote:
On 6/21/23 09:34, Bart Van Assche wrote:
Regarding removing zone write locking, would it be acceptable to
implement a solution for SCSI devices before it is clear how to
implement a solution for NVMe devices? I think a potential solution for
SCSI devices is to send requests that should be requeued to the SCSI
error handler instead of to the block layer requeue list. The SCSI error
handler waits until all pending requests have timed out or have been
sent to the error handler. The SCSI error handler can be modified such
that requests are sorted in LBA order before being resubmitted. This
would solve the nasty issues that would otherwise arise when requeuing
requests if multiple write requests for the same zone are pending.

I am still thinking that a dedicated hctx for writes to sequential zones may be
the simplest solution for all device types:
1) For scsi HBAs, we can likely gain high qd zone writes, but that needs to be
checked. For AHCI though, we need to keep the max write qd=1 per zone because of
the chipsets reordering command submissions. So we'll need a queue flag saying
"need zone write locking" indicated by the adapter when creating the queue.
2) For NVMe, this would allow high QD writes, with only the penalty of heavier
locking overhead when writes are issued from multiple CPUs.

But I have not started looking at all the details. Need to start prototyping
something. We can try working on this together if you want.

Hi Damien,

I'm interested in collaborating on this. But I'm not sure whether a dedicated hardware queue for sequential writes is a full solution. Applications must submit zoned writes (other than write appends) in order. These zoned writes may end up in a software queue. It is possible that the software queues are flushed in such a way that the zoned writes are reordered. Or do you perhaps want to send all zoned writes directly to a hardware queue? If so, is this really a better solution than a single-queue I/O scheduler? Is the difference perhaps that higher read IOPS can be achieved because multiple hardware queues are used for reads?

Even if all sequential writes would be sent to a single hardware queue, to support queue depths > 1, we still need a mechanism for resubmitting requests in order after a request has been requeued. If e.g. three zoned writes are in flight and a unit attention is reported for the second write then resubmitting the two writes that have to be resubmitted must only happen after both writes have completed.

Another possibility is to introduce a new request queue flag that specifies that only writes should be sent to the I/O scheduler. I'm interested in this because of the following observation for zoned UFS devices for a block size of 4 KiB and a random read workload:
* mq-deadline scheduler:  59 K IOPS.
* no I/O scheduler:      100 K IOPS.
In other words, 70% more IOPS with no I/O scheduler compared to mq-deadline. I don't think that this indicates a performance bug in the mq-deadline scheduler. From a quick measurement with the null_blk driver it seems to me that all I/O schedulers saturate around 150 K - 170 K IOPS.

Thanks,

Bart.




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux