Re: [PATCHv10 0/9] write hints with nvme fdp, scsi streams

Damien Le Moal <dlemoal@xxxxxxxxxx> · Tue, 10 Dec 2024 08:13:49 +0900

On 12/10/24 07:13, Bart Van Assche wrote:
> On 12/5/24 12:03 AM, Nitesh Shetty wrote:
>> But where do we store the read sector info before sending write.
>> I see 2 approaches here,
>> 1. Should it be part of a payload along with write ?
>>      We did something similar in previous series which was not liked
>>      by Christoph and Bart.
>> 2. Or driver should store it as part of an internal list inside
>> namespace/ctrl data structure ?
>>      As Bart pointed out, here we might need to send one more fail
>>      request later if copy_write fails to land in same driver.
> 
> Hi Nitesh,
> 
> Consider the following example: dm-linear is used to concatenate two
> block devices. An NVMe device (LBA 0..999) and a SCSI device (LBA
> 1000..1999). Suppose that a copy operation is submitted to the dm-linear
> device to copy LBAs 1..998 to LBAs 2..1998. If the copy operation is
> submitted as two separate operations (REQ_OP_COPY_SRC and
> REQ_OP_COPY_DST) then the NVMe device will receive the REQ_OP_COPY_SRC
> operation and the SCSI device will receive the REQ_OP_COPY_DST
> operation. The NVMe and SCSI device drivers should fail the copy 
> operations after a timeout because they only received half of the copy
> operation. After the timeout the block layer core can switch from
> offloading to emulating a copy operation. Waiting for a timeout is
> necessary because requests may be reordered.
> 
> I think this is a strong argument in favor of representing copy
> operations as a single operation. This will allow stacking drivers
> as dm-linear to deal in an elegant way with copy offload requests
> where source and destination LBA ranges map onto different block
> devices and potentially different block drivers.

Why ? As long as REQ_OP_COPY_SRC carries both source and destination
information, DM can trivially detect that the copy is not within a single device
and either return ENOTSUPP or switch to using a regular read+write operations
using block layer helpers. Or the block layer can fallback to that emulation
itself if it gets a ENOTSUPP from the device.

I am not sure how a REQ_OP_COPY_SRC BIO definition would look like. Ideally, we
want to be able to describe several source LBA ranges with it and for the above
issue also have the destination LBA range as well. If we can do that in a nice
way, I do not see the need for switching back to a single BIO, though we could
too I guess. From what Martin said for scsi token-based copy, it seems that 2
operations is easier. Knowing how the scsi stack works, I can see that too.

-- 
Damien Le Moal
Western Digital Research