On Wed, Sep 23, 2020 at 08:20:15AM +0100, Christoph Hellwig wrote: > On Wed, Sep 16, 2020 at 08:29:42PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > In xfs_bui_item_recover, there exists a use-after-free bug with regards > > to the inode that is involved in the bmap replay operation. If the > > mapping operation does not complete, we call xfs_bmap_unmap_extent to > > create a deferred op to finish the unmapping work, and we retain a > > pointer to the incore inode. > > > > Unfortunately, the very next thing we do is commit the transaction and > > drop the inode. If reclaim tears down the inode before we try to finish > > the defer ops, we dereference garbage and blow up. Therefore, create a > > way to join inodes to the defer ops freezer so that we can maintain the > > xfs_inode reference until we're done with the inode. > > > > Note: This imposes the requirement that there be enough memory to keep > > every incore inode in memory throughout recovery. > > As in every inode that gets recovered, not every inode in the system. > I think the commit log could use a very slight tweak here. > > Didn't we think of just storing the inode number for recovery, or > did this turn out too complicated? (I'm pretty sure we dicussed this > in detail before, but my memory gets foggy). Initially I did just store the inode numbers, but that made the code more clunky due to needing more _iget and _irele calls, and a bunch of error handling for that. Dave suggested on irc that I should retain the reference to the incore inode to simplify the initial patch, and if we run into ENOMEM then we can fix it later. I wasn't 100% convinced of that, but Dave or Brian or someone (memory foggy, don't remember who) countered that the system recovering the fs is usually the system that crashed in the first place, and it certainly had enough RAM to hold the inodes. I think the link you're looking for is[1]. --D [1] https://lore.kernel.org/linux-xfs/158864123329.184729.14504239314355330619.stgit@magnolia/