Re: [PATCH 3/3] xfs: teach deferred op freezer to freeze and thaw inodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 28, 2020 at 03:17:47PM -0700, Darrick J. Wong wrote:
> On Mon, Apr 27, 2020 at 07:37:52AM -0400, Brian Foster wrote:
> > On Sat, Apr 25, 2020 at 12:01:37PM -0700, Christoph Hellwig wrote:
> > > On Tue, Apr 21, 2020 at 07:08:26PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > 
> > > > Make it so that the deferred operations freezer can save inode numbers
> > > > when we freeze the dfops chain, and turn them into pointers to incore
> > > > inodes when we thaw the dfops chain to finish them.  Next, add dfops
> > > > item freeze and thaw functions to the BUI/BUD items so that they can
> > > > take advantage of this new feature.  This fixes a UAF bug in the
> > > > deferred bunmapi code because xfs_bui_recover can schedule another BUI
> > > > to continue unmapping but drops the inode pointer immediately
> > > > afterwards.
> > > 
> > > I'm only looking over this the first time, but why can't we just keep
> > > inode reference around during reocvery instead of this fairly
> > > complicated scheme to save the ino and then look it up again?
> > > 
> > 
> > I'm also a little confused about the use after free in the first place.
> > Doesn't xfs_bui_recover() look up the inode itself, or is the issue that
> > xfs_bui_recover() is fine but we might get into
> > xfs_bmap_update_finish_item() sometime later on the same inode without
> > any reference?
> 
> The second.  In practice it doesn't seem to trigger on the existing
> code, but the combination of atomic extent swap + fsstress + shutdown
> testing was enough to push it over the edge once due to reclaim.
> 
> > If the latter, similarly to Christoph I wonder if we
> > really could/should grab a reference on the inode for the intent itself,
> > even though that might not be necessary outside of recovery.
> 
> Outside of recovery we don't have the UAF problem because there's always
> something (usually the VFS dentry cache, but sometimes an explicit iget)
> that hold a reference to the inode for the duration of the transaction
> and dfops processing.
> 

Right, that's what I figured.

> One could just hang on to all incore inodes until the end of recovery
> like Christoph says, but the downside of doing it that way is that now
> we require enough memory to maintain all that incore state vs. only
> needing enough for the incore inodes involved in a particular dfops
> chain.  That isn't a huge deal now, but I was looking ahead to atomic
> extent swaps.
> 

What I was thinking above was tying the reference to the lifetime of the
intents associated with the inode, not necessarily the full lifetime of
recovery. It's not immediately clear to me if that indirectly leads to a
similar chain of in-core inodes due to unusual ordering of dfops chains
during recovery; ISTM that would mean a deviation from the typical
runtime dfops ordering, but perhaps I'm missing something...

That aside, based on your description above it seems we currently rely
on this icache retention behavior for recovery anyways, otherwise we'd
hit this use after free and probably have user reports. That suggests to
me that holding a reference is a logical next step, at least as a bug
fix patch to provide a more practical solution for stable/distro
kernels. For example, if we just associated an iget()/iput() with the
assignment of the xfs_bmap_intent->bi_owner field (and the eventual free
of the intent structure), would that technically solve the inode use
after free problem?

BTW, I also wonder about the viability of changing ->bi_owner to an
xfs_ino_t instead of a direct pointer, but that might be more
involved than just adding a reference to the existing scheme...

Brian

> (And, yeah, I should put that series on the list now...)
> 
> > Either way, more details about the problem being fixed in the commit log
> > would be helpful.
> 
> <nod>
> 
> --D
> 
> > Brian
> > 
> 




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux