Bart, >> What would be the benefit of submitting these operations concurrently? > > I expect that submitting the two copy operations concurrently would > result in lower latency for NVMe devices because the REQ_OP_COPY_DST > operation can be submitted without waiting for the REQ_OP_COPY_SRC > result. Perhaps you are engaging in premature optimization? > If the block layer would have to manage the ROD token, how would the > ROD token be provided to the block layer? In the data buffer described by the bio, of course. Just like the data buffer when we do a READ. Only difference here is that the data is compressed to a fixed size and thus only 512 bytes long regardless of the amount of logical blocks described by the operation. > Bidirectional commands have been removed from the Linux kernel a while > ago so the REQ_OP_COPY_IN parameter data would have to be used to pass > parameters to the SCSI driver and also to pass the ROD token back to > the block layer. A normal READ operation also passes parameters to the SCSI driver. These are the start LBA and the transfer length. That does not make it a bidirectional command. > While this can be implemented, I'm not sure that we should integrate > support in the block layer for managing ROD tokens since ROD tokens > are a concept that is specific to the SCSI protocol. A well-known commercial operating system supports copy offload via the token-based approach. I don't see any reason why our implementation should exclude a wide variety of devices in the industry supported by that platform. And obviously, given that this other operating system uses a token-based implementation in their stack, one could perhaps envision this capability appearing in other protocols in the future? In any case. I only have two horses in this race: 1. Make sure that our user API and block layer implementation are flexible enough to accommodate current and future offload specifications. 2. Make sure our implementation is as simple as possible. Splitting the block layer implementation into a semantic read followed by a semantic write permits token-based offload to be supported. It also makes the implementation simple because there is no concurrency element. The only state is owned by the entity which issues the bio. No lookups, no timeouts, no allocating things in sd.c and hoping that somebody remembers to free them later despite the disk suddenly going away. Even if we were to not support the token-based approach and only do single-command offload, I still think the two-phase operation makes things simpler and more elegant. -- Martin K. Petersen Oracle Linux Engineering