Re: [RFC PATCH v2] xfs: run blockgc on freeze to avoid iget stalls after reclaim

Brian Foster <bfoster@xxxxxxxxxx> · Sun, 4 Feb 2024 11:03:07 -0500

On Fri, Feb 02, 2024 at 03:33:43PM -0800, Darrick J. Wong wrote:
> On Fri, Feb 02, 2024 at 02:41:56PM -0500, Brian Foster wrote:
> > On Thu, Feb 01, 2024 at 12:16:03PM +1100, Dave Chinner wrote:
> > > On Mon, Jan 29, 2024 at 10:02:16AM -0500, Brian Foster wrote:
> > > > On Fri, Jan 26, 2024 at 10:46:12AM +1100, Dave Chinner wrote:
> > > > > On Thu, Jan 25, 2024 at 07:46:55AM -0500, Brian Foster wrote:
> > > > > > On Mon, Jan 22, 2024 at 02:23:44PM +1100, Dave Chinner wrote:
> > > > > > > On Fri, Jan 19, 2024 at 02:36:45PM -0500, Brian Foster wrote:
> > ...
> > > Here's the fixes for the iget vs inactive vs freeze problems in the
> > > upstream kernel:
> > > 
> > > https://lore.kernel.org/linux-xfs/20240201005217.1011010-1-david@xxxxxxxxxxxxx/T/#t
> > > 
> > > With that sorted, are there any other issues we know about that
> > > running a blockgc scan during freeze might work around?
> > > 
> > 
> > The primary motivation for the scan patch was the downstream/stable
> > deadlock issue. The reason I posted it upstream is because when I
> > considered the overall behavior change, I thought it uniformly
> > beneficial to both contexts based on the (minor) benefits of the side
> > effects of the scan. You don't need me to enumerate them, and none of
> > them are uniquely important or worth overanalyzing.
> > 
> > The only real question that matters here is do you agree with the
> > general reasoning for a blockgc scan during freeze, or shall I drop the
> > patch?
> 

Hi Darrick,

> I don't see any particular downside to flushing {block,inode}gc work
> during a freeze, other than the loss of speculative preallocations
> sounds painful.
> 

Yeah, that's definitely a tradeoff. The more I thought about that, the
more ISTM that any workload that might be sensitive enough to the
penalty of an extra blockgc scan, the less likely it's probably going to
see freeze cycles all that often.

I suspect the same applies to the bit of extra work added to the freeze
as well , but maybe there's some more painful scenario..?

> Does Dave's patchset to recycle NEEDS_INACTIVE inodes eliminate the
> stall problem?
> 

I assume it does. I think some of the confusion here is that I probably
would have gone in a slightly different direction on that issue, but
that's a separate discussion.

As it relates to this patch, in hindsight I probably should have
rewritten the commit log from the previous version. If I were to do that
now, it might read more like this (factoring out sync vs. non-sync
nuance and whatnot):

"
xfs: run blockgc on freeze to keep inodes off the inactivation queues

blockgc processing is disabled when the filesystem is frozen. This means
<words words words about blockgc> ...

Rather than hold pending blockgc inodes in inactivation queues when
frozen, run a blockgc scan during the freeze sequence to process this
subset of inodes up front. This allows reclaim to potentially free these
inodes while frozen (by keeping them off inactive lists) and produces a
more predictable/consistent on-disk freeze state. The latter is
potentially beneficial for shapshots, as this means no dangling post-eof
preallocs or cowblock recovery.

Potential tradeoffs for this are <yadda yadda, more words from above>
...
"

... but again, the primary motivation for this was still the whole
deadlock thing. I think it's perfectly reasonable to look at this change
and say it's not worth it. Thanks for the feedback.

Brian

> --D
> 
> > Brian
> > 
> > > -Dave.
> > > -- 
> > > Dave Chinner
> > > david@xxxxxxxxxxxxx
> > > 
> > 
> > 
>