On 6/24/24 3:44 AM, Nitesh Shetty wrote:
For reference, I have listed the approaches we have taken in the past. a. Token/payload based approach: 1. Here we allocate a buffer/payload. 2. First source BIO is sent along with the buffer. 3. Once the buffer reaches driver, it is filled with the source LBA and length and namespace info. And the request is completed. 4. Then destination BIO is sent with same buffer. 5. Once the buffer reaches driver, it retrieves the source information from the BIO and forms a copy command and sends it down to device. We received feedback that putting anything inside payload which is not data, is not a good idea[1].
A token-based approach (pairing copy_src and copy_dst based on a token) is completely different from a payload-based approach (copy offload parameters stored in the bio payload). From [1] (I agree with what has been quoted): "In general every time we tried to come up with a request payload that is not just data passed to the device it has been a nightmare." [ ... ] "The only thing we'd need is a sequence number / idr / etc to find an input and output side match up, as long as we stick to the proper namespace scope."
c. List/ctx based approach: A new member is added to bio, bio_copy_ctx, which will a union with bi_integrity. Idea is once a copy bio reaches blk_mq_submit_bio, it will add the bio to this list. 1. Send the destination BIO, once this reaches blk_mq_submit_bio, this will add the destination BIO to the list inside bi_copy_ctx and return without forming any request. 2. Send source BIO, once this reaches blk_mq_submit_bio, this will retrieve the destination BIO from bi_copy_ctx and form a request with destination BIO and source BIO. After this request will be sent to driver. This work is still in POC phase[2]. But this approach makes lifetime management of BIO complicated, especially during failure cases.
Associating src and dst operations by embedding a pointer to a third data structure in struct bio is an implementation choice and is not the only possibility for assocating src and dst operations. Hence, the bio lifetime complexity mentioned above is not inherent to the list based approach but is a result of the implementation choice made for associating src and dst operations. Has it been considered to combine the list-based approach for managing unpaired copy operations with the token based approach for pairing copy src and copy dst operations? Thanks, Bart.