On Tue, Jul 12, 2011 at 08:56:35PM -0700, Linus Torvalds wrote: > It would take the locks a few more times, but it would avoid the nasty > lock ordering issues exactly because it drops them in between rather > than try to hold them all at once. And d_move() is *not* a performance > critical area, afaik, so the point would be to make the locking more > straightforward and avoid the horrible subtlety. > > Crazy? Stupid? What am I missing? It's probably something obvious. It's loops over d_subdirs of given dentry. IOW, dentry can be found not just via the hash chains and those iterators (ones outside of fs/dcache.c) don't play with rename seqlock. They just rely on ->d_lock on dentry being enough to stabilize the list of children. It's not all that horrible, since i_mutex on parent (held in quite a few of those) is enough to prevent d_move() altogether. Or iterator sits on fs that never does d_move/d_materialise_unique at all. There are exceptions. The most generic one is __fsnotify_update_child_dentry_flags(), but that's not all. E.g. coda ->d_revalidate(), possibly some ceph code... Overall, it might be doable, but it'll take some massage in strange places. Hell knows; I think at that point in release cycle we don't want to go there. After looking through ->d_lock users I'm mostly convinced that unlazy_walk() fix + making damn sure we don't try to create loops in d_materialise_unique() + adding missing ->i_mutex where needed is the way to go for now. We have several places where ->d_lock overlap is not parent-to-child, but these really can't lead to deadlocks - fs is specialized enough to prevent them. However, looking through these loops over ->d_subdirs shows *another* pile of bugs, like this one: mutex_lock(&bus->d_inode->i_mutex); list_for_each_entry(dev, &bus->d_subdirs, d_u.d_child) if (dev->d_inode) update_dev(dev); mutex_unlock(&bus->d_inode->i_mutex); in drivers/usb/core/inode.c:update_bus(). See the problem? Sure, it's ramfs-like: all dentries are pinned down until unlink/rmdir, which is blocked by i_mutex on parent. And yes, d_move() is also not a problem (ditto). However, if you open a file and unlink it, dentry will be unpinned, but stay around. It will be unhashed, but it'll be on ->d_subdir. Close the file and dentry will be killed, without i_mutex on parent. Note that ->d_locked will be taken - both on victim and parent. But the quoted sucker takes no ->d_lock at all. Bummer... IOW, iterators relying _purely_ on i_mutex on parent are currently fucked. We have more such cases. Another one is debugfs_remove_recursive(). I wonder if we simply ought to evict the sucker from d_subdirs/d_child in simple_unlink()... It would require verifying that users of that animal are OK with that, but I suspect that most of them will be happy. It might simplify life for readdir() and llseek() on such directories, while we are at it... The "hunt for dentry leaks" logics on umount might become slightly less useful for those filesystems (we'll get "parent is misteriously busy" instead of "child is unlinked, but somehow busy on fs shutdown"), but I doubt it's that serious. Hell knows - I never had to hunt dentry leaks on debugfs/usbfs/etc. BTW, usbfs tree iterators should've been killed a long time ago - we simply need ->permission() and ->getattr() added for usbfs to DTRT instead of this "iterate over all inodes, patch new i_uid/i_gid/i_mode when remount asks to do that" crap. BTW^2: in clk_debugfs_register_one() in assorted arch/arm/mach-*/clock.c d = c->dent; list_for_each_entry_safe(child, child_tmp, &d->d_subdirs, d_u.d_child) debugfs_remove(child); debugfs_remove(c->dent); should be debugfs_remove_recursive(). Here we don't even bother with i_mutex... BTW^3: sel_remove_classes() plays very unsafe games - no ->i_mutex at all, no ->d_lock until we get leaf directories. Root-only, but... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html