Re: [PATCHv10 0/9] write hints with nvme fdp, scsi streams

"Martin K. Petersen" <martin.petersen@xxxxxxxxxx> · Wed, 11 Dec 2024 14:38:24 -0500

Bart,

>> What would be the benefit of submitting these operations concurrently?
>
> I expect that submitting the two copy operations concurrently would
> result in lower latency for NVMe devices because the REQ_OP_COPY_DST
> operation can be submitted without waiting for the REQ_OP_COPY_SRC
> result.

Perhaps you are engaging in premature optimization?

> If the block layer would have to manage the ROD token, how would the
> ROD token be provided to the block layer?

In the data buffer described by the bio, of course. Just like the data
buffer when we do a READ. Only difference here is that the data is
compressed to a fixed size and thus only 512 bytes long regardless of
the amount of logical blocks described by the operation.

> Bidirectional commands have been removed from the Linux kernel a while
> ago so the REQ_OP_COPY_IN parameter data would have to be used to pass
> parameters to the SCSI driver and also to pass the ROD token back to
> the block layer.

A normal READ operation also passes parameters to the SCSI driver. These
are the start LBA and the transfer length. That does not make it a
bidirectional command.

> While this can be implemented, I'm not sure that we should integrate
> support in the block layer for managing ROD tokens since ROD tokens
> are a concept that is specific to the SCSI protocol.

A well-known commercial operating system supports copy offload via the
token-based approach. I don't see any reason why our implementation
should exclude a wide variety of devices in the industry supported by
that platform. And obviously, given that this other operating system
uses a token-based implementation in their stack, one could perhaps
envision this capability appearing in other protocols in the future?

In any case. I only have two horses in this race:

1. Make sure that our user API and block layer implementation are
   flexible enough to accommodate current and future offload
   specifications.

2. Make sure our implementation is as simple as possible.

Splitting the block layer implementation into a semantic read followed
by a semantic write permits token-based offload to be supported. It also
makes the implementation simple because there is no concurrency element.
The only state is owned by the entity which issues the bio. No lookups,
no timeouts, no allocating things in sd.c and hoping that somebody
remembers to free them later despite the disk suddenly going away.

Even if we were to not support the token-based approach and only do
single-command offload, I still think the two-phase operation makes
things simpler and more elegant.

-- 
Martin K. Petersen	Oracle Linux Engineering