On Thu, Sep 01, 2022 at 10:42:06AM +0300, Christoph Hellwig wrote: > Currently the I/O submitters have to split bios according to the > chunk stripe boundaries. This leads to extra lookups in the extent > trees and a lot of boilerplate code. > > To drop this requirement, split the bio when __btrfs_map_block > returns a mapping that is smaller than the requested size and > keep a count of pending bios in the original btrfs_bio so that > the upper level completion is only invoked when all clones have > completed. > > Based on a patch from Qu Wenruo. > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > --- > fs/btrfs/volumes.c | 106 +++++++++++++++++++++++++++++++++++++-------- > fs/btrfs/volumes.h | 1 + > 2 files changed, 90 insertions(+), 17 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 5c6535e10085d..0a2d144c20604 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -35,6 +35,7 @@ > #include "zoned.h" > > static struct bio_set btrfs_bioset; > +static struct bio_set btrfs_clone_bioset; > static struct bio_set btrfs_repair_bioset; > static mempool_t btrfs_failed_bio_pool; > > @@ -6661,6 +6662,7 @@ static void btrfs_bio_init(struct btrfs_bio *bbio, struct inode *inode, > bbio->inode = inode; > bbio->end_io = end_io; > bbio->private = private; > + atomic_set(&bbio->pending_ios, 1); > } > > /* > @@ -6698,6 +6700,57 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size, > return bio; > } > > +static struct bio *btrfs_split_bio(struct bio *orig, u64 map_length) > +{ > + struct btrfs_bio *orig_bbio = btrfs_bio(orig); > + struct bio *bio; > + > + bio = bio_split(orig, map_length >> SECTOR_SHIFT, GFP_NOFS, > + &btrfs_clone_bioset); > + btrfs_bio_init(btrfs_bio(bio), orig_bbio->inode, NULL, orig_bbio); > + > + btrfs_bio(bio)->file_offset = orig_bbio->file_offset; > + orig_bbio->file_offset += map_length; I'm worried about this for the ONE_ORDERED case. We specifically used the ONE_ORDERED thing because our file_offset was the start, but our length could go past the range of the ordered extent, and then we wouldn't find our ordered extent and things would go quite wrong. Instead we should do something like if (!(orig->bi_opf & REQ_BTRFS_ONE_ORDERED)) orig_bbio->file_offset += map_length; I've cc'ed Omar since he's the one who added this and I'm a little confused about how this can happen. > + > + atomic_inc(&orig_bbio->pending_ios); > + return bio; > +} > + > +static void btrfs_orig_write_end_io(struct bio *bio); > +static void btrfs_bbio_propagate_error(struct btrfs_bio *bbio, > + struct btrfs_bio *orig_bbio) > +{ > + /* > + * For writes btrfs tolerates nr_mirrors - 1 write failures, so we > + * can't just blindly propagate a write failure here. > + * Instead increment the error count in the original I/O context so > + * that it is guaranteed to be larger than the error tolerance. > + */ > + if (bbio->bio.bi_end_io == &btrfs_orig_write_end_io) { > + struct btrfs_io_stripe *orig_stripe = orig_bbio->bio.bi_private; > + struct btrfs_io_context *orig_bioc = orig_stripe->bioc; > + Whitespace error here. Thanks, Josef