Re: [PATCHv10 0/9] write hints with nvme fdp, scsi streams

"Martin K. Petersen" <martin.petersen@xxxxxxxxxx> · Wed, 27 Nov 2024 21:09:55 -0500

Bart,

> What if the source LBA range does not require splitting but the
> destination LBA range requires splitting, e.g. because it crosses a
> chunk_sectors boundary? Will the REQ_OP_COPY_IN operation succeed in
> this case and the REQ_OP_COPY_OUT operation fail?

Yes.

I experimented with approaching splitting in an iterative fashion. And
thus, if there was a split halfway through the COPY_IN I/O, we'd issue a
corresponding COPY_OUT up to the split point and hope that the write
subsequently didn't need a split. And then deal with the next segment.

However, given that copy offload offers diminishing returns for small
I/Os, it was not worth the hassle for the devices I used for
development. It was cleaner and faster to just fall back to regular
read/write when a split was required.

> Does this mean that a third operation is needed to cancel
> REQ_OP_COPY_IN operations if the REQ_OP_COPY_OUT operation fails?

No. The device times out the token.

> Additionally, how to handle bugs in REQ_OP_COPY_* submitters where a
> large number of REQ_OP_COPY_IN operations is submitted without
> corresponding REQ_OP_COPY_OUT operation? Is perhaps a mechanism
> required to discard unmatched REQ_OP_COPY_IN operations after a
> certain time?

See above.

For your EXTENDED COPY use case there is no token and thus the COPY_IN
completes immediately.

And for the token case, if you populate a million tokens and don't use
them before they time out, it sounds like your submitting code is badly
broken. But it doesn't matter because there are no I/Os in flight and
thus nothing to discard.

> Hmm ... we may each have a different opinion about whether or not the
> COPY_IN/COPY_OUT semantics are a requirement for token-based copy
> offloading.

Maybe. But you'll have a hard time convincing me to add any kind of
state machine or bio matching magic to the SCSI stack when the simplest
solution is to treat copying like a read followed by a write. There is
no concurrency, no kernel state, no dependency between two commands, nor
two scsi_disk/scsi_device object lifetimes to manage.

> Additionally, I'm not convinced that implementing COPY_IN/COPY_OUT for
> ODX devices is that simple. The COPY_IN and COPY_OUT operations have
> to be translated into three SCSI commands, isn't it? I'm referring to
> the POPULATE TOKEN, RECEIVE ROD TOKEN INFORMATION and WRITE USING
> TOKEN commands. What is your opinion about how to translate the two
> block layer operations into these three SCSI commands?

COPY_IN is translated to a NOP for devices implementing EXTENDED COPY
and a POPULATE TOKEN for devices using tokens.

COPY_OUT is translated to an EXTENDED COPY (or NVMe Copy) for devices
using a single command approach and WRITE USING TOKEN for devices using
tokens.

There is no need for RECEIVE ROD TOKEN INFORMATION.

I am not aware of UFS devices using the token-based approach. And for
EXTENDED COPY there is only a single command sent to the device. If you
want to do power management while that command is being processed,
please deal with that in UFS. The block layer doesn't deal with the
async variants of any of the other SCSI commands either...

-- 
Martin K. Petersen	Oracle Linux Engineering