On Sun, Oct 26, 2014 at 07:13:32PM +0000, Al Viro wrote: > The comment is not correct. dentry_kill() won't screw the pointer to > parent; it will, however, screw the pointer to next sibling. > > It used to screw the pointer to parent (which is what the first part of > condition was about). After Nick's series back in January 2011 that > has stopped being true. However, dentry_kill() does > list_del(&dentry->d_u.d_child). Which means that we can't continue > past that point if it has happened. Trond has noticed the breakage > a bit later and added explicit check for ->d_flags, but the damage > was more extensive - Nick had missed the restarts-on-killed logics > hidden in the check for changed ->d_parent and assumed that it's > all about renames, meaning that once rename_lock has been taken exclusive > we won't have restarts at all. With restart-on-killed restored that > wasn't true anymore, invalidating the assumption that we only get to > rename_retry without rename_lock held exclusive. With deadlocks happening > if we ever get there on such pass. Actually, it's even worse than just list_del() possibly screwing .next - that could be worked around by use of __list_del() (and skipping them until we come to one that isn't marked DCACHE_DENTRY_KILLED). Note that d_child shares memory with d_rcu, and _that_ is really nasty - if __dentry_kill() has progressed to dentry_free(), we have ->d_u.d_child.next hopelessly trashed. OTOH, we could make d_rcu share memory with d_alias instead. Hrm... OK, then we'd have rcu_read_lock(); ascend: if (this_parent != parent) { struct dentry *child = this_parent; this_parent = child->d_parent; spin_unlock(&child->d_lock); spin_lock(&this_parent->d_lock); if (need_seqretry(&rename_lock, seq)) { spin_unlock(&this_parent->d_lock); rcu_read_unlock(); goto rename_retry; } next = child->d_child.next; while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED)) { if (next == &this_parent.d_subdirs) goto ascend; child = list_entry(next, struct dentry, d_child); next = next->next; } rcu_read_unlock(); goto resume; } rcu_read_unlock(); if (need_seqretry(&rename_lock, seq)) { spin_unlock(&this_parent->d_lock); goto rename_retry; } in d_walk(), __list_del() instead of list_del() in __dentry_kill(), d_u.d_child turning into d_child everywhere, while d_alias turns into d_u.d_alias... It looks like that way we would get no retries on the second pass. Comments? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html