On Tue, Jan 18, 2022 at 09:29:11AM +0100, Christian Brauner wrote: > On Mon, Jan 17, 2022 at 06:10:36PM +0000, Al Viro wrote: > > On Mon, Jan 17, 2022 at 04:28:52PM +0000, Al Viro wrote: > > > > > IOW, ->free_inode() is RCU-delayed part of ->destroy_inode(). If both > > > are present, ->destroy_inode() will be called synchronously, followed > > > by ->free_inode() from RCU callback, so you can have both - moving just > > > the "finally mark for reuse" part into ->free_inode() would be OK. > > > Any blocking stuff (if any) can be left in ->destroy_inode()... > > > > BTW, we *do* have a problem with ext4 fast symlinks. Pathwalk assumes that > > strings it parses are not changing under it. There are rather delicate > > dances in dcache lookups re possibility of ->d_name contents changing under > > it, but the search key is assumed to be stable. > > > > What's more, there's a correctness issue even if we do not oops. Currently > > we do not recheck ->d_seq of symlink dentry when we dismiss the symlink from > > the stack. After all, we'd just finished traversing what used to be the > > contents of a symlink that used to be in the right place. It might have been > > unlinked while we'd been traversing it, but that's not a correctness issue. > > > > But that critically depends upon the contents not getting mangled. If it > > *can* be screwed by such unlink, we risk successful lookup leading to the > > Out of curiosity: whether or not it can get mangled depends on the > filesystem and how it implements fast symlinks or do fast symlinks > currently guarantee that contents are mangled? Not sure if I understand your question correctly... Filesystems should guarantee that the contents of string returned by ->get_link() (or pointed to by ->i_link) remains unchanged for as long as we are looking at it (until fs/namei.c:put_link() that drops it or fs/namei.c:drop_links() in the end of pathwalk). Fast symlinks or not - doesn't matter. The only cases where that does not hold (there are two of them in the entire kernel) happen to be fast symlinks. Both cases are bugs. ext4 case is actually easy to fix - the only reason it ends up mangling the string is the way ext4_truncate() implements its check for victim being a fast symlink (and thus needing no work). It gets disrupted by zeroing ->i_size, which we need to do in this case (inode removal). That's not hard to get right. A plenty of other fast symlink variants (starting with ext2 ones, BTW) do not step into anything of that sort. IIRC, we used to have at least some cases (orangefs, perhaps?) where revalidate on a symlink might (with confused or malicious server) end up modifying the contents, possibly right under somebody else walking that symlink. Also a bug...