On Tue, Jan 18, 2022 at 04:04:50PM +0000, Al Viro wrote: > On Tue, Jan 18, 2022 at 09:29:11AM +0100, Christian Brauner wrote: > > On Mon, Jan 17, 2022 at 06:10:36PM +0000, Al Viro wrote: > > > On Mon, Jan 17, 2022 at 04:28:52PM +0000, Al Viro wrote: > > > > > > > IOW, ->free_inode() is RCU-delayed part of ->destroy_inode(). If both > > > > are present, ->destroy_inode() will be called synchronously, followed > > > > by ->free_inode() from RCU callback, so you can have both - moving just > > > > the "finally mark for reuse" part into ->free_inode() would be OK. > > > > Any blocking stuff (if any) can be left in ->destroy_inode()... > > > > > > BTW, we *do* have a problem with ext4 fast symlinks. Pathwalk assumes that > > > strings it parses are not changing under it. There are rather delicate > > > dances in dcache lookups re possibility of ->d_name contents changing under > > > it, but the search key is assumed to be stable. > > > > > > What's more, there's a correctness issue even if we do not oops. Currently > > > we do not recheck ->d_seq of symlink dentry when we dismiss the symlink from > > > the stack. After all, we'd just finished traversing what used to be the > > > contents of a symlink that used to be in the right place. It might have been > > > unlinked while we'd been traversing it, but that's not a correctness issue. > > > > > > But that critically depends upon the contents not getting mangled. If it > > > *can* be screwed by such unlink, we risk successful lookup leading to the > > > > Out of curiosity: whether or not it can get mangled depends on the > > filesystem and how it implements fast symlinks or do fast symlinks > > currently guarantee that contents are mangled? > > Not sure if I understand your question correctly... > > Filesystems should guarantee that the contents of string returned > by ->get_link() (or pointed to by ->i_link) remains unchanged for as long > as we are looking at it (until fs/namei.c:put_link() that drops it or > fs/namei.c:drop_links() in the end of pathwalk). Fast symlinks or not - > doesn't matter. Yep, got that. > > The only cases where that does not hold (there are two of them in > the entire kernel) happen to be fast symlinks. Both cases are bugs. Ok, that's what I was essentially after whether or not they were bugs in the filesystems or it's a generic bug. > ext4 case is actually easy to fix - the only reason it ends up mangling > the string is the way ext4_truncate() implements its check for victim > being a fast symlink (and thus needing no work). It gets disrupted > by zeroing ->i_size, which we need to do in this case (inode removal). > That's not hard to get right. Oh, I see, it zeroes i_size and erases i_data which obviously tramples the fast symlink contents. Given that ext4 makes use of i_flags for their ext4 inode containers why couldn't this just be sm like #define EXT4_FAST_SYMLINK 0x<some-free-value> if (EXT4_I(inode)->i_flags & EXT4_FAST_SYMLINK) return <im-a-fast-symlink>; ? Which seems simpler and more obvious to someone reading that code than logic based on substracting blocks or checking i_size.