On Tue, Jun 22, 2021 at 01:16:04PM +0200, David Sterba wrote: > On Tue, Jun 01, 2021 at 03:45:08PM -0400, Josef Bacik wrote: > > We have been hitting some early ENOSPC issues in production with more > > recent kernels, and I tracked it down to us simply not flushing delalloc > > as aggressively as we should be. With tracing I was seeing us failing > > all tickets with all of the block rsvs at or around 0, with very little > > pinned space, but still around 120mib of outstanding bytes_may_used. > > Upon further investigation I saw that we were flushing around 14 pages > > per shrink call for delalloc, despite having around 2gib of delalloc > > outstanding. > > > > Consider the example of a 8 way machine, all cpu's trying to create a > > file in parallel, which at the time of this commit requires 5 items to > > do. Assuming a 16k leaf size, we have 10mib of total metadata reclaim > > size waiting on reservations. Now assume we have 128mib of delalloc > > outstanding. With our current math we would set items to 20, and then > > set to_reclaim to 20 * 256k, or 5mib. > > > > Assuming that we went through this loop all 3 times, for both > > FLUSH_DELALLOC and FLUSH_DELALLOC_WAIT, and then did the full loop > > twice, we'd only flush 60mib of the 128mib delalloc space. This could > > leave a fair bit of delalloc reservations still hanging around by the > > time we go to ENOSPC out all the remaining tickets. > > > > Fix this two ways. First, change the calculations to be a fraction of > > the total delalloc bytes on the system. Prior to my change we were > > calculating based on dirty inodes so our math made more sense, now it's > > just completely unrelated to what we're actually doing. > > > > Second add a FLUSH_DELALLOC_FULL state, that we hold off until we've > > gone through the flush states at least once. This will empty the system > > of all delalloc so we're sure to be truly out of space when we start > > failing tickets. > > > > I'm tagging stable 5.10 and forward, because this is where we started > > using the page stuff heavily again. This affects earlier kernel > > versions as well, but would be a pain to backport to them as the > > flushing mechanisms aren't the same. > > > > CC: stable@xxxxxxxxxxxxxxx # 5.10 > > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > > As this is going to be resent, I'll remove it from misc-next for now. > Updated version can go in as a fix after rc1. Ok so that does not work, the patchset "[PATCH 0/4][v2] btrfs: commit the transaction unconditionally for ensopc" https://lore.kernel.org/linux-btrfs/cover.1623421213.git.josef@xxxxxxxxxxxxxx/ touches the defines and can't be trivially resolved.