Valerie Aurora: > No, that's not a sufficient description and leaves open questions > about all sorts of deadlocks and race conditions. For example, > inotify events occur while holding locks only on one layer. You > obviously need to lock the top layer to update the inheritance and > parent-child relationships. Now you are locking the lower layer first > and the top layer second, which is the reverse of the usual order. I don't agree about deadlock and race condition. When user modifies the dir hierarchy on the layer directly during aufs_rename() is running, aufs will detect it after lock_rename(). It behaves like this. - decide the layer where actual rename operates. create the dir hierarchy on it if necessary. - lock_rename() for the layer - calls ->rename() or - if the renaming file exists on the lower readonly layer, aufs will copyup it to the upper writable layer as the rename target name. In this case, ->rename() is not called. If a user changes the dir hierarchy directly on the layer before aufs_rename(), then the notify event tells aufs it and aufs gets the latetst hierarchy. If it happens before lock_rename() in aufs_rename(), aufs verifies the relationship between the target child and the locked dir. if it differs, return EBUSY. Of course, lock_rename() follows the "ancestors first" order described in Documentation/filesystem/directory-locking. > around on the lower layer is safe. In general, your first task is to > show a global lock ordering to prove lack of deadlocks (which I don't > think you should spend time on because most VFS experts think it is > impossible to do with two read-write layers). Since you may not read this anymore and other people doesn't seem to be intrested in aufs, it may not be meaningful to write down about locking in aufs. But I will try. At first, - since aufs is FS, it has its own super_block, dentry and inode. - super_block, dentry and inode in aufs have private data which contains rwsem. - the locking order for these rwsem is child-first. - aufs specifies FS_RENAME_DOES_D_MOVE. locking order in aufs_rename + down_read() for aufs sb protects sb from branch-add, delete. + two down_write()s for src and dest child protects them from other processes in aufs. + down_write() for the dst_parent. + decide the layer where we will operate, by comparing the index of layers where the targets exist and the layer attribute (ro, rw). + copyup the dest dir hierarchy if necessary, by repeating - dget_parent(), down/up_read() for the parent (in aufs) - mutex_lock() for the dir (on the layer) to mkdir the non-existing child dir on the layer and verify the parent-child relationship. - mkdir and setattr on the layer. - mutex_unlock() the dir on the layer. + test they are rename-able if it is a dir, it must be empty (logically) or must not have children on the multiple branches. + if src_parent and dst_parent differ, down_write both. up_write for dst_parent may be necessary to keep the "child-first" rule in aufs. (from here the "sub-VFS" characteristic of aufs appears) + lock_rename() on the layer and verify the every relationships between child and parent. + test the src_child is deletable. + test the dst_child is add-able or deletable if it exists. + vfs_rename() on the layer or copyup src_child as a dst_child name. + unlock_rename() on the layer (return to aufs world) + d_drop() dst_child if necessary. + d_move() + up_write() for src_parent and dst_parent + up_write() fot src_child and dst_child + up_read() for aufs sb Strictly speaking, there are more things which aufs_rename() handles such as inode attributes, whiteout, opaque-dir, internal pointers to the object on the layer, temporary dir-name. But they are unrelated to the locking order essentially. So I didn't describe about them. Thank you reading this long mail. J. R. Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html