On Thu, Aug 25, 2022 at 7:16 PM NeilBrown <neilb@xxxxxxx> wrote: > > If a filesystem supports parallel modification in directories, it sets > FS_PAR_DIR_UPDATE on the file_system_type. lookup_open() and the new > lookup_hash_update() notice the flag and take a shared lock on the > directory, and rely on a lock-bit in d_flags, much like parallel lookup > relies on DCACHE_PAR_LOOKUP. Ugh. I absolutely believe in the DCACHE_PAR_LOOKUP model, and in "parallel updates" being important, but I *despise* locking code like this + if (wq && IS_PAR_UPDATE(dir)) + inode_lock_shared_nested(dir, I_MUTEX_PARENT); + else + inode_lock_nested(dir, I_MUTEX_PARENT); and I really really hope there's some better model for this. That "wq" test in particular is just absolutely disgusting. So now it doesn't just depend on whether the directory supports parallel updates, now the caller can choose whether to do the parallel thing or not, and that's how "create" is different from "rename". And that last difference is, I think, the one that makes me go "No. HELL NO". Instead of it being up to the filesystem to say "I can do parallel creates, but I need to serialize renames", this whole thing has been set up to be about the caller making that decision. That's just feels horribly horribly wrong. Yes, I realize that to you that's just a "later patches will do renames", but what if it really is a filesystem issue where the filesystem can easily handle new names, but needs something else for renames because it has various cross-directory issues, perhaps? So I feel this is fundamentally wrong, and this ugliness is a small effect of that wrongness. I think we should strive for (a) make that 'wq' and DCACHE_PAR_LOOKUP bit be unconditional (b) aim for the inode lock being taken *after* the _lookup_hash(), since the VFS layer side has to be able to handle the concurrency on the dcache side anyway (c) at that point, the serialization really ends up being about the call into the filesystem, and aim to simply move the inode_lock{_shared]_nested() into the filesystem so that there's no need for a flag and related conditional locking at all. Because right now I think the main reason we cannot move the lock into the filesystem is literally that we've made the lock cover not just the filesystem part, but the "lookup and create dentry" part too. But once you have that "DCACHE_PAR_LOOKUP" bit and the d_alloc_parallel() logic to serialize a _particular_ dentry being created (as opposed to serializing all the sleeping ops to that directly), I really think we should strive to move the locking - that no longer helps the VFS dcache layer - closer to the filesystem call and eventually into it. Please? I think these patches are "mostly going in the right direction", but at the same time I feel like there's some serious mis-design going on. Linus