On Sun, May 06, 2018 at 01:59:51AM +0100, Al Viro wrote: > > There is nothing at the moment that needs fixing. > > Funny, that... I'd been going through the damn thing for the > last week or so; open-by-fhandle/nfs export support is completely > buggered. And as for the rest... the least said about the error > handling, the better - something like rename() hitting an IO > error (read one) can not only screw the on-disk data into the > ground, it can do seriously bad things to kernel data structures. ... and while we are at it, consider the following: mkdir a mkdir b touch a/x ln a/x b <umount and mount again, to get cold dcache, or just wait for memory pressure to evict those suckers> process A: unlink("a/x"); process B: open("b/x"); We had two entries - one in a, another in b; the one in a is ST_FILE one, the one in b - ST_LINKFILE, with ->original refering to the ST_FILE one. unlink() needs to kick the entry out of a; since it can't leave an ST_FILE not included into any directory (and since the file should live on due to b/x still being there) it ends up removing ST_LINKFILE entry from b and moving ST_FILE one in its place. That happens in affs_remove_link(). However, open() gets there just as this surgery is about to begin. It gets to affs_lookup(), which finds the entry for b/x, reads it, stashes block number of that entry into dentry->d_fsdata, notices that it's ST_LINKFILE one, picks the block number of ST_FILE one out of it and proceeds to look up the inode; that doesn't end up doing any work (the same inode is in icache due to a/x having been looked up by unlink(2)). In the meanwhile, since the hash table of b has been unlocked once we'd done hash lookup, affs_remove_link() can proceed with the surgery. It *does* try to catch dentries with ->d_fsdata containing the block number of sacrificed ST_LINKFILE and reset that to block number of ST_FILE that gets moved in its place. However, it does so by static void affs_fix_dcache(struct inode *inode, u32 entry_ino) { struct dentry *dentry; spin_lock(&inode->i_lock); hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias) { if (entry_ino == (u32)(long)dentry->d_fsdata) { dentry->d_fsdata = (void *)inode->i_ino; break; } } spin_unlock(&inode->i_lock); } i.e. looping through dentries that point to our inode. Except that *our* dentry isn't attached to inode yet, so we are SOL - it's left with ->d_fsdata pointing to (destroyed) ST_LINKFILE. After a while process B closes the file and unlinks it. Take a look at affs_remove_header() and you'll see how much fun we are in for - it uses ->d_fsdata to pick the entry to find and remove... That's an fs corruptor, plain and simple. As far as I had been able to reconstruct the history, leftover after Roman's locking changes 17 years ago. AFAICS, I'd missed it back then - the races I'd spotted had been dealt with about a year later (when we started to lock the victim in vfs_unlink/vfs_rmdir/vfs_rename), but this one went unnoticed...