Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Mon, Sep 17, 2012 at 1:39 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: >> >> Egads... The problem is real and analysis, AFAICS, is correct, but result >> is extremely ugly ;-/ > > Agreed. > > The problem (or at least one *part* of the problem) is that the "goto > rename_retry" case is done for two different entities entirely: > > - the "try_to_ascend()" failure path, which can happen even when > renamelock is held for writing. > > - the "if we weren't write-locked before, and the read-lock failed" > case (which obviously cannot happen if we already held things for > writing) > > That said, I'm not sure why/how that try_to_ascend() could even fail > when we're holding things locked. I guess it's the DCACHE_DISCONNECTED > case that triggers. Yes, with the test cases that IBM were using it is DCACHE_DISCONNECTED case that triggers the double-lock. Trond was misusing DCACHE_DISCONNECTED and this made the failure in try_to_ascend() much more likely (and bogus). But there is a case, which is triggered rarely if ever, when try_to_ascend() failure with rename_lock held is perfectly valid. try_to_ascend() does an ellaborate locking dance that aims to ensure that child remains valid after having ascended to parent. It nees a valid child because it needs the next sibling to continue tree traversal. If the child becomes invalid (final dput()) then the whole tree traversal needs to be restarted since we don't know where we left off. So we basically do 1. take the RCU lock 2. unlock child 3. lock parent 4. check whether child has been freed, if yes restart traversal 5. release RCU 6. find next sibling (child->d_child.next) 7. lock new child It is step 4. that needs the RCU guarantees so we can check child's dentry flag (DCACHE_DISCONNECTED flag presently). After that d_lock on parent will protect "child" from going away (d_kill or d_move) and we can get hold of the next sibling. Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html