Re: [RFC PATCH v2] xfs: run blockgc on freeze to avoid iget stalls after reclaim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 05, 2024 at 02:07:27PM -0800, Darrick J. Wong wrote:
> On Sun, Feb 04, 2024 at 11:03:07AM -0500, Brian Foster wrote:
> > On Fri, Feb 02, 2024 at 03:33:43PM -0800, Darrick J. Wong wrote:
> > > On Fri, Feb 02, 2024 at 02:41:56PM -0500, Brian Foster wrote:
> > > > On Thu, Feb 01, 2024 at 12:16:03PM +1100, Dave Chinner wrote:
> > > > > On Mon, Jan 29, 2024 at 10:02:16AM -0500, Brian Foster wrote:
> > > > > > On Fri, Jan 26, 2024 at 10:46:12AM +1100, Dave Chinner wrote:
> > > > > > > On Thu, Jan 25, 2024 at 07:46:55AM -0500, Brian Foster wrote:
> > > > > > > > On Mon, Jan 22, 2024 at 02:23:44PM +1100, Dave Chinner wrote:
> > > > > > > > > On Fri, Jan 19, 2024 at 02:36:45PM -0500, Brian Foster wrote:
> > > > ...
> > > > > Here's the fixes for the iget vs inactive vs freeze problems in the
> > > > > upstream kernel:
> > > > > 
> > > > > https://lore.kernel.org/linux-xfs/20240201005217.1011010-1-david@xxxxxxxxxxxxx/T/#t
> > > > > 
> > > > > With that sorted, are there any other issues we know about that
> > > > > running a blockgc scan during freeze might work around?
> > > > > 
> > > > 
> > > > The primary motivation for the scan patch was the downstream/stable
> > > > deadlock issue. The reason I posted it upstream is because when I
> > > > considered the overall behavior change, I thought it uniformly
> > > > beneficial to both contexts based on the (minor) benefits of the side
> > > > effects of the scan. You don't need me to enumerate them, and none of
> > > > them are uniquely important or worth overanalyzing.
> > > > 
> > > > The only real question that matters here is do you agree with the
> > > > general reasoning for a blockgc scan during freeze, or shall I drop the
> > > > patch?
> > > 
> > 
> > Hi Darrick,
> > 
> > > I don't see any particular downside to flushing {block,inode}gc work
> > > during a freeze, other than the loss of speculative preallocations
> > > sounds painful.
> > > 
> > 
> > Yeah, that's definitely a tradeoff. The more I thought about that, the
> > more ISTM that any workload that might be sensitive enough to the
> > penalty of an extra blockgc scan, the less likely it's probably going to
> > see freeze cycles all that often.
> > 
> > I suspect the same applies to the bit of extra work added to the freeze
> > as well , but maybe there's some more painful scenario..?
> 
> <shrug> I suppose if you had a few gigabytes of speculative
> preallocations across a bunch of log files (or log structured tree
> files, or whatever) then you could lose those preallocations and make
> fragmentation worse.  Since blockgc can run on open files, maybe we
> should leave that out of the freeze preparation syncfs?
> 

By "leave that out," do you mean leave out the blockgc scan on freeze,
or use a special mode that explicitly skips over opened/writeable files?

FWIW, this sounds more like a generic improvement to the background scan
to me. Background blockgc currently filters out on things like whether
the file is dirty in pagecache. If you have a log file or something, I
would think the regular background scan may end up processing such files
more frequently than a freeze induced one will..? And for anything that
isn't under active or continuous modification, freeze is already going
to flush everything out for the first post-unfreeze background scan to
take care of.

So I dunno, I think I agree and disagree. :) I think it would be
perfectly reasonable to add an open/writeable file filter check to the
regular background scan to make it less aggressive. This patch does
invoke the background scan, but only because of the wonky read into a
mapped buffer use case. I still think freeze should (eventually) rather
invoke the more aggressive sync scan and process all pending work before
quiesce and not alter behavior based on heuristics.

> OTOH most of the inodes on those lists are not open at all, so perhaps
> we /should/ kick inodegc while preparing for freeze?  Such a patch could
> reuse the justification below after s/blockgc/inodegc/.  Too bad we
> didn't think far enough into the FIFREEZE design to allow userspace to
> indicate if they want us to minimize freeze time or post-freeze
> recovery time.
> 

Yeah, I think this potentially ties in interestingly with the
security/post freeze drop caches thing Christian brought up on fsdevel
recently. It would be more ideal if freeze actually had some controls
that different use cases could use to suggest how aggressive (or not) to
be with such things. Maybe that somewhat relates to the per-stage
->freeze_fs() prototype thing I posted earlier in the thread [1] to help
support running a sync scan.

Given the current implementation, I think ultimately it just depends on
your perspective of what freeze is supposed to do. To me, it should
reliably put the filesystem into a predictable state on-disk (based on
the common snapshot use case). It is a big hammer that should be
scheduled with care wrt to any performance sensitive workloads, and
should be minimally disruptive to the system when held for a
non-deterministic/extended amount of time. Departures from that are
either optimizations or extra feature/modifiers that we currently don't
have a great way to control. Just my .02.

Brian

[1] Appended to this reply:
  https://lore.kernel.org/linux-xfs/ZbJYP63PgykS1CwU@bfoster/

> --D
> 
> > > Does Dave's patchset to recycle NEEDS_INACTIVE inodes eliminate the
> > > stall problem?
> > > 
> > 
> > I assume it does. I think some of the confusion here is that I probably
> > would have gone in a slightly different direction on that issue, but
> > that's a separate discussion.
> > 
> > As it relates to this patch, in hindsight I probably should have
> > rewritten the commit log from the previous version. If I were to do that
> > now, it might read more like this (factoring out sync vs. non-sync
> > nuance and whatnot):
> > 
> > "
> > xfs: run blockgc on freeze to keep inodes off the inactivation queues
> > 
> > blockgc processing is disabled when the filesystem is frozen. This means
> > <words words words about blockgc> ...
> > 
> > Rather than hold pending blockgc inodes in inactivation queues when
> > frozen, run a blockgc scan during the freeze sequence to process this
> > subset of inodes up front. This allows reclaim to potentially free these
> > inodes while frozen (by keeping them off inactive lists) and produces a
> > more predictable/consistent on-disk freeze state. The latter is
> > potentially beneficial for shapshots, as this means no dangling post-eof
> > preallocs or cowblock recovery.
> > 
> > Potential tradeoffs for this are <yadda yadda, more words from above>
> > ...
> > "
> > 
> > ... but again, the primary motivation for this was still the whole
> > deadlock thing. I think it's perfectly reasonable to look at this change
> > and say it's not worth it. Thanks for the feedback.
> > 
> > Brian
> > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > > > -Dave.
> > > > > -- 
> > > > > Dave Chinner
> > > > > david@xxxxxxxxxxxxx
> > > > > 
> > > > 
> > > > 
> > > 
> > 
> > 
> 





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux