On 23/11/2021 12:09, Qu Wenruo wrote: > > > On 2021/11/23 16:13, Christoph Hellwig wrote: >> On Tue, Nov 23, 2021 at 04:10:35PM +0800, Qu Wenruo wrote: >>> Without bio_chain() sounds pretty good, as we can still utilize >>> bi_end_io and bi_private. >>> >>> But this also means, we're now responsible not to release the source bio >>> since it has the real bi_io_vec. >> >> Just call bio_inc_remaining before submitting the cloned bio, and then >> call bio_endio on the root bio every time a clone completes. >> > Yeah, that sounds pretty good for regular usage. > > But there is another very tricky case involved. > > For btrfs, it supports zoned device, thus we have special calls sites to > switch between bio_add_page() and bio_add_zoned_append_page(). > > But zoned write can't not be split, nor there is an easy way to directly > convert a regular bio into a bio with zoned append pages. > > Currently if we go the slow path, by allocating a new bio, then add > pages from original bio, and advance the original bio, we're able to do > the conversion from regular bio to zoned append bio. > > Any idea on this corner case? I think we have to differentiate two cases here: A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split because we cannot guarantee the order the device writes the data to disk. For the RAID stripe bio we can split it into the two (or more) parts that will end up on _different_ devices. All we need to do is a) ensure it doesn't cross the device's zone append limit and b) clamp all bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to the rules of REQ_OP_ZONE_APPEND.