On Tue, Jun 01, 2021 at 03:45:08PM -0400, Josef Bacik wrote: > We have been hitting some early ENOSPC issues in production with more > recent kernels, and I tracked it down to us simply not flushing delalloc > as aggressively as we should be. With tracing I was seeing us failing > all tickets with all of the block rsvs at or around 0, with very little > pinned space, but still around 120mib of outstanding bytes_may_used. > Upon further investigation I saw that we were flushing around 14 pages > per shrink call for delalloc, despite having around 2gib of delalloc > outstanding. > > Consider the example of a 8 way machine, all cpu's trying to create a > file in parallel, which at the time of this commit requires 5 items to > do. Assuming a 16k leaf size, we have 10mib of total metadata reclaim > size waiting on reservations. Now assume we have 128mib of delalloc > outstanding. With our current math we would set items to 20, and then > set to_reclaim to 20 * 256k, or 5mib. > > Assuming that we went through this loop all 3 times, for both > FLUSH_DELALLOC and FLUSH_DELALLOC_WAIT, and then did the full loop > twice, we'd only flush 60mib of the 128mib delalloc space. This could > leave a fair bit of delalloc reservations still hanging around by the > time we go to ENOSPC out all the remaining tickets. > > Fix this two ways. First, change the calculations to be a fraction of > the total delalloc bytes on the system. Prior to my change we were > calculating based on dirty inodes so our math made more sense, now it's > just completely unrelated to what we're actually doing. > > Second add a FLUSH_DELALLOC_FULL state, that we hold off until we've > gone through the flush states at least once. This will empty the system > of all delalloc so we're sure to be truly out of space when we start > failing tickets. > > I'm tagging stable 5.10 and forward, because this is where we started > using the page stuff heavily again. This affects earlier kernel > versions as well, but would be a pain to backport to them as the > flushing mechanisms aren't the same. > > CC: stable@xxxxxxxxxxxxxxx # 5.10 > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> As this is going to be resent, I'll remove it from misc-next for now. Updated version can go in as a fix after rc1.