On Mon, Aug 04, 2014 at 09:07:32PM -0700, Linus Torvalds wrote: > So renaming it to "lockref_get_active()", and changing the "not zero" > test to check for "positive" and change the rtype of "count" to be > signed, all sound like good things to me. Fine by me. I can do that, unless somebody else beats me to that. > But I don't actually think it's an active bug, it's just an "active > horribly ugly and subtly working code". I guess in theory if you can > get lots of CPU's triggering the race at the same time, the magic > negative number could become zero and positive, but at that point I > don't think we're really talking reality any more. > > Can somebody pick holes in that? Does somebody want to send in the > cleanup patch? FWIW, dget_parent() has been a bloody annoyance - most of the callers used to be racy and looking through the current set... Take a look at e.g. fs/btrfs/tree-log.c:check_parent_dirs_for_sync(). In the loop there we do if (!parent || !parent->d_inode || sb != parent->d_inode->i_sb) break; if (IS_ROOT(parent)) break; parent = dget_parent(parent); dput(old_parent); old_parent = parent; inode = parent->d_inode; and it's so bogus it's not even funny. What the hell is that code trying to check? And while we are at it, can we race with renames there? >From my reading of that code, most of it ought to have been under rcu_read_lock() and sod all that playing with refcounts. Other btrfs users are not much nicer (who says, for instance, that result of check_parent_dirs_for_sync() will remain valid?) ecryptfs lock_parent(): AFAICS, all callers could do ecryptfs_dentry_to_lower() on dentry->d_parent instead of doing ecryptfs_dentry_to_lower(dentry) and then doing dget_parent() - the covering layer dentries have ->d_parent stabilized by ->i_mutex on said parents and we really, really don't want to run into the case when tree topologies go out of sync. I.e. we want to grab ->i_mutex on lower(parent), then check that it's equal to parent(lower) and, if it's at all possible that some joker has played with rename(2) on underlying layer mounted somewhere else, drop everything we'd taken and bugger off. XFS xfs_filestream_get_parent() is clearly bogus - parent = dget_parent(dentry); if (!parent) goto out_dput; When does dget_parent() return NULL? And that's just from a quick grep. The point is, dget_parent() is not a nice API - historically it's been abused more often than not and we keep playing whack-a-mole with it. Sigh... Speaking of bogosities, apparently nobody has noticed that automagical turning BSD process accounting off upon r/o remounts of the filesystem hosting the log got broken several years ago. Suppose we find an opened log when remounting. OK, it gets closed, which means filp_close(...). Which means that actual closing that sucker will happen just before we return to userland. Return with -EBUSY, since the filesystem still has a file opened for write when we get around to checking that... I have kinda-sorta fixes for that (basically, restoring the situation we used to have before delayed fput() and being damn careful about avoiding deadlocks), but I would really love to just rip that idiocy out. IOW, just let it fail with -EBUSY in such cases, same as for any other write-opened file on that fs. That, BTW, is the most painful part of the acct series, so being able to drop it would be very nice. Comments? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html