On Wed, Sep 27, 2023 at 06:03:26PM -0400, Kent Overstreet wrote: > On Wed, Sep 27, 2023 at 07:23:38AM -0400, Brian Foster wrote: > > The bcachefs folio writeback code includes a bio full check as well > > as a fixed size check when it determines whether to submit the > > current write op or continue to add to the current bio. The current > > code submits prematurely when the current folio fits exactly in the > > remaining space allowed in the current bio, which typically results > > in an extent merge that would have otherwise been unnecessary. This > > can be observed with a buffered write sized exactly to the current > > maximum value (1MB) and with key_merging_disabled=1. The latter > > prevents the merge from the second write such that a subsequent > > check of the extent list shows a 1020k extent followed by a > > contiguous 4k extent. > > > > It's not totally clear why the fixed write size check exists. > > bio_full() already checks that the bio can accommodate the current > > dirty range being processed, so the only other concern is write > > latency. Even then, a 1MB cap seems rather small. For reference, > > iomap includes a folio batch size (of 4k) to mitigate latency > > associated with writeback completion folio processing, but that > > restricts writeback bios to somewhere in the range of 16MB-256MB > > depending on folio size (i.e. considering 4k to 64k pages). Unless > > there is some known reason for it, remove the size limit and rely on > > bio_full() to cap the size of the bio. > > We definitely need some sort of a cap; otherwise, there's nothing > preventing us from building up gigabyte+ bios (multipage bvecs, large > folios), and we don't want that. > Yeah, I forgot about the multipage bvecs case when I was first reading through this code. > This probably needs to be a sysctl - better would be a hint provided by > the filesystem based on the performance characteristics of the > device(s), but the writeback code doesn't know which device we're > writing to so that'll be tricky to plumb. > Agree, but IIUC it looks like there's more work to do (related to the write bounce code) before we can change this limit. I've got a v2 of this patch, which I'll post shortly, that retains the limit, fixes up the logic wart that originally brought attention to this code, and adds some comments. Thanks for the additional context. Brian > iomap has IOEND_BATCH_SIZE, which is another hard coded limit; perhaps > iomap could use the new sysctl as well. > > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > --- > > fs/bcachefs/fs-io-buffered.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/fs/bcachefs/fs-io-buffered.c b/fs/bcachefs/fs-io-buffered.c > > index 58ccc7b91ac7..d438b93a3a30 100644 > > --- a/fs/bcachefs/fs-io-buffered.c > > +++ b/fs/bcachefs/fs-io-buffered.c > > @@ -607,8 +607,6 @@ static int __bch2_writepage(struct folio *folio, > > if (w->io && > > (w->io->op.res.nr_replicas != nr_replicas_this_write || > > bio_full(&w->io->op.wbio.bio, sectors << 9) || > > - w->io->op.wbio.bio.bi_iter.bi_size + (sectors << 9) >= > > - (BIO_MAX_VECS * PAGE_SIZE) || > > bio_end_sector(&w->io->op.wbio.bio) != sector)) > > bch2_writepage_do_io(w); > > > > -- > > 2.41.0 > > >