On Thu, Oct 22, 2020 at 04:40:11PM -0700, Eric Biggers wrote: > On Thu, Oct 22, 2020 at 10:22:25PM +0100, Matthew Wilcox (Oracle) wrote: > > +static int readpage_submit_bhs(struct page *page, struct blk_completion *cmpl, > > + unsigned int nr, struct buffer_head **bhs) > > +{ > > + struct bio *bio = NULL; > > + unsigned int i; > > + int err; > > + > > + blk_completion_init(cmpl, nr); > > + > > + for (i = 0; i < nr; i++) { > > + struct buffer_head *bh = bhs[i]; > > + sector_t sector = bh->b_blocknr * (bh->b_size >> 9); > > + bool same_page; > > + > > + if (buffer_uptodate(bh)) { > > + end_buffer_async_read(bh, 1); > > + blk_completion_sub(cmpl, BLK_STS_OK, 1); > > + continue; > > + } > > + if (bio) { > > + if (bio_end_sector(bio) == sector && > > + __bio_try_merge_page(bio, bh->b_page, bh->b_size, > > + bh_offset(bh), &same_page)) > > + continue; > > + submit_bio(bio); > > + } > > + bio = bio_alloc(GFP_NOIO, 1); > > + bio_set_dev(bio, bh->b_bdev); > > + bio->bi_iter.bi_sector = sector; > > + bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh)); > > + bio->bi_end_io = readpage_end_bio; > > + bio->bi_private = cmpl; > > + /* Take care of bh's that straddle the end of the device */ > > + guard_bio_eod(bio); > > + } > > The following is needed to set the bio encryption context for the > '-o inlinecrypt' case on ext4: > > diff --git a/fs/buffer.c b/fs/buffer.c > index 95c338e2b99c..546a08c5003b 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -2237,6 +2237,7 @@ static int readpage_submit_bhs(struct page *page, struct blk_completion *cmpl, > submit_bio(bio); > } > bio = bio_alloc(GFP_NOIO, 1); > + fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO); > bio_set_dev(bio, bh->b_bdev); > bio->bi_iter.bi_sector = sector; > bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh)); Thanks! I saw that and had every intention of copying it across. And then I forgot. I'll add that. I'm also going to do: - __bio_try_merge_page(bio, bh->b_page, bh->b_size, - bh_offset(bh), &same_page)) + bio_add_page(bio, bh->b_page, bh->b_size, + bh_offset(bh))) I wonder about allocating bios that can accommodate more bvecs. Not sure how often filesystems have adjacent blocks which go into non-adjacent sub-page blocks. It's certainly possible that a filesystem might have a page consisting of DDhhDDDD ('D' for Data, 'h' for hole), but how likely is it to have written the two data chunks next to each other? Maybe with O_SYNC? Anyway, this patchset needs some more thought because I've just seen the path from mpage_readahead() to block_read_full_page() that should definitely not be synchronous.