On Tue, 2022-01-18 at 01:32 +0000, Al Viro wrote: > On Mon, Jan 17, 2022 at 07:48:49PM +0000, Al Viro wrote: > > > But that critically depends upon the contents not getting > > > mangled. If it > > > *can* be screwed by such unlink, we risk successful lookup > > > leading to the > > > wrong place, with nothing to tell us that it's happening. We > > > could handle > > > that by adding a check to fs/namei.c:put_link(), and propagating > > > the error > > > to callers. It's not impossible, but it won't be pretty. > > > > > > And that assumes we avoid oopsen on string changing under us in > > > the first > > > place. Which might or might not be true - I hadn't finished the > > > audit yet. > > > Note that it's *NOT* just fs/namei.c + fs/dcache.c + some fs > > > methods - > > > we need to make sure that e.g. everything called by ->d_hash() > > > instances > > > is OK with strings changing right under them. Including > > > utf8_to_utf32(), > > > crc32_le(), utf8_casefold_hash(), etc. > > > > And AFAICS, ext4, xfs and possibly ubifs (I'm unfamiliar with that > > one and > > the call chains there are deep enough for me to miss something) > > have the > > "bugger the contents of string returned by RCU ->get_link() if > > unlink() > > happens" problem. > > > > I would very much prefer to have them deal with that crap, > > especially > > since I don't see why does ext4_evict_inode() need to do that > > memset() - > > can't we simply check ->i_op in ext4_can_truncate() and be done > > with > > that? > > This reuse-without-delay has another fun side, AFAICS. Suppose the > new use > for inode comes with the same ->i_op (i.e. it's a symlink again) and > it > happens right after ->get_link() has returned the pointer to body. > > We are already past whatever checks we might add in pick_link(). And > the > pointer is still valid. So we end up quietly traversing the body of > completely unrelated symlink that never had been anywhere near any > directory > we might be looking at. With no indication of anything going wrong - > just > a successful resolution with bogus result. Wouldn't that case be caught by the unlazy call since ->get_link() needs to return -ECHILD for the rcu case now (in xfs anyway)? > > Could XFS folks explain what exactly goes wrong if we make actual > marking > inode as ready for reuse RCU-delayed, by shifting just that into > ->free_inode()? Why would we need any extra synchronize_rcu() > anywhere?