On Mon, Feb 29, 2016 at 01:34:13PM +0100, Dmitry Vyukov wrote: > It's not that I really understand what happens here, but looking at > the diff: is it the case that negative and inode can change under our > feet? If so, we still probably can get an inconsistent picture (i.e. > negative dentry but not NULL inode), can it be an issue? Is > non-negative->negative->non-negative->negative transition possible? If > so, we still probably can get the same crash regardless of order of > negative/inode loads. Yes, we can - relying on the ordering is brittle and wrong. See other posting for possible solution; at least that one has much more understandable rules: * ->d_seq is bumped before and after modifications of ->d_inode and ->d_flags, which both provides the barriers and (which is what matters for x86) guarantees that ->d_seq match on recheck (which we do anyway) means that ->d_inode and ->d_flags match each other. * RCU users of that part of ->d_flags should be verified by ->d_seq check (we already are doing that - should_follow_link() didn't and that was one of the bugs that got fixed). * non-RCU users either have the parent locked (which stabilizes everything) or have dentry pinned and positive (ditto). Checking that dentry is negative (either by looking at inode or flags) does not guarantee that it will stay such unless the parent is locked anyway. IOW, the games with barriers and order of assignments between ->d_inode and ->d_flags do not actually buy us anything useful. * in case of __dentry_kill() we do *NOT* surround the stores to ->d_inode/->d_flags with ->d_seq bumps, but that's safe since by that point we had already done __d_drop(), so RCU reader either doesn't find the dentry in the first place, or gets ->d_seq bumped (by 2) between the moment it's been fetched by __d_lookup_rcu() finding the sucker and the moment when ->d_inode/->d_flags get changed. If RCU reader gets in before that, it sees consistent ->d_inode/->d_flags, as they used to be. It will eventually fail to grab a reference to that dentry, but that's not our problem. If it gets in late enough to see ->d_inode and/or ->d_flags changed, it will fail ->d_seq check and ignore the values it has fetched. Again, no need for a barrier between ->d_inode and ->d_flags stores in that case. As the matter of fact, I'm somewhat tempted to make static void dentry_unlink_inode(struct dentry * dentry) __releases(dentry->d_lock) __releases(dentry->d_inode->i_lock) { struct inode *inode = dentry->d_inode; bool hashed = !d_unhashed(dentry); if (hashed) raw_write_seqcount_begin(&dentry->d_seq); __d_clear_type_and_inode(dentry); hlist_del_init(&dentry->d_u.d_alias); if (hashed) raw_write_seqcount_end(&dentry->d_seq); spin_unlock(&dentry->d_lock); spin_unlock(&inode->i_lock); if (!inode->i_nlink) fsnotify_inoderemove(inode); if (dentry->d_op && dentry->d_op->d_iput) dentry->d_op->d_iput(dentry, inode); else iput(inode); } and replace dentry_iput() in its only caller with if (dentry->d_inode) dentry_unlink_inode(dentry); /* will drop ->d_lock */ else spin_unlock(&dentry->d_lock); That would get rid of annoying code duplication, but I would like to see profiles - the cost of branches might very well get unpleasant. Not sure, and that part definitely isn't a -stable fodder. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html