On Mon, Mar 20, 2023 at 10:28 AM Bart Van Assche <bvanassche@xxxxxxx> wrote: > > On 3/17/23 23:29, Christoph Hellwig wrote: > > On Fri, Mar 17, 2023 at 12:59:36PM -0700, Bart Van Assche wrote: > >> For zoned storage it is essential that split bios are submitted in LBA order. > >> This patch series realizes this by modifying __bio_split_to_limits() such that > >> it submits the first bio fragment and returns the remainder instead of > >> submitting the remainder and returning the first bio fragment. Please consider > >> this patch series for the next merge window. > > > > Why are you sending large writes using REQ_OP_WRITE and not > > using REQ_OP_ZONE_APPEND which side steps all these issues? > > Hi Christoph, > > How to achieve optimal performance with REQ_OP_ZONE_APPEND for SCSI > devices? My understanding of how REQ_OP_ZONE_APPEND works for SCSI > devices is as follows: > * ATA devices cannot support this operation directly because there are > not enough bits in the ATA sense data to report where appended data > has been written. > * T10 has not yet started with standardizing a zone append operation. > * The code that emulates REQ_OP_ZONE_APPEND for SCSI devices (in > sd_zbc.c) serializes REQ_OP_ZONE_APPEND operations (QD=1). > * To achieve optimal performance, QD > 1 is required. I recall there were dragons lurking particularly with how we handle requeues wherein just submitting in order was not sufficient to guarantee IO is actually dispatched in order. (of note: when requeueing a request, we splice it to the _end_ of the hctx dispatch list, so if you get a requeue in the middle of a multi-segment IO, it will get re-ordered. I recall this change went in specifically to re-order requests in case there was a passthrough lurking to un-jam a device.) Have you looked at this? Perhaps requeues are slowpath anyways, so we could sort there? There may also be other requeue weirdness with layered devices... Khazhy