On Wed, Apr 29, 2020 at 04:48:19AM -0700, Christoph Hellwig wrote: > On Wed, Apr 29, 2020 at 07:38:03AM -0400, Brian Foster wrote: > > That aside, based on your description above it seems we currently rely > > on this icache retention behavior for recovery anyways, otherwise we'd > > hit this use after free and probably have user reports. That suggests to > > me that holding a reference is a logical next step, at least as a bug > > fix patch to provide a more practical solution for stable/distro > > kernels. For example, if we just associated an iget()/iput() with the > > assignment of the xfs_bmap_intent->bi_owner field (and the eventual free > > of the intent structure), would that technically solve the inode use > > after free problem? > > Yes, that's what I thought. > > > > > BTW, I also wonder about the viability of changing ->bi_owner to an > > xfs_ino_t instead of a direct pointer, but that might be more > > involved than just adding a reference to the existing scheme... > > It is actually pretty easy, but I'm not sure if hitting the icache for > every finished bmap item is all that desirable. It came with a noticeable (~2%) slowdown on a swapext-heavy fsstress run, which was my motivation for this (somewhat clunky) system for avoiding all that overhead except for recovery. Hmm. Actually now that I think harder about it, the bmap item is completely incore and fields are selectively copied to the log item. This means that regular IO could set bi_owner = <some inode number> and bi_ip = <the incore inode>. Recovery IO can set bi_owner but leave bi_ip NULL, and then the bmap item replay can iget as needed. Now we don't need this freeze/thaw thing at all. --D