Re: [PATCH/RFC 00/10 v5] Improve scalability of directory operations

"John Stoffel" <john@xxxxxxxxxxx> · Fri, 26 Aug 2022 10:42:00 -0400

>>>>> "NeilBrown" == NeilBrown  <neilb@xxxxxxx> writes:

NeilBrown> [I made up "v5" - I haven't been counting]

My first comments, but I'm not a serious developer...

NeilBrown> VFS currently holds an exclusive lock on the directory while making
NeilBrown> changes: add, remove, rename.
NeilBrown> When multiple threads make changes in the one directory, the contention
NeilBrown> can be noticeable.
NeilBrown> In the case of NFS with a high latency link, this can easily be
NeilBrown> demonstrated.  NFS doesn't really need VFS locking as the server ensures
NeilBrown> correctness.

NeilBrown> Lustre uses a single(?) directory for object storage, and has patches
NeilBrown> for ext4 to support concurrent updates (Lustre accesses ext4 directly,
NeilBrown> not via the VFS).

NeilBrown> XFS (it is claimed) doesn't its own locking and doesn't need the VFS to
NeilBrown> help at all.

This sentence makes no sense to me... I assume you meant to say "...does
it's own locking..."

NeilBrown> This patch series allows filesystems to request a shared lock on
NeilBrown> directories and provides serialisation on just the affected name, not the
NeilBrown> whole directory.  It changes both the VFS and NFSD to use shared locks
NeilBrown> when appropriate, and changes NFS to request shared locks.

Are there any performance results?  Why wouldn't we just do a shared
locked across all VFS based filesystems?  

NeilBrown> The central enabling feature is a new dentry flag DCACHE_PAR_UPDATE
NeilBrown> which acts as a bit-lock.  The ->d_lock spinlock is taken to set/clear
NeilBrown> it, and wait_var_event() is used for waiting.  This flag is set on all
NeilBrown> dentries that are part of a directory update, not just when a shared
NeilBrown> lock is taken.

NeilBrown> When a shared lock is taken we must use alloc_dentry_parallel() which
NeilBrown> needs a wq which must remain until the update is completed.  To make use
NeilBrown> of concurrent create, kern_path_create() would need to be passed a wq.
NeilBrown> Rather than the churn required for that, we use exclusive locking when
NeilBrown> no wq is provided.

Is this a per-operation wq or a per-directory wq?  Can there be issues
if someone does something silly like having 1,000 directories, all of
which have multiple processes making parallel changes?  

Does it degrade gracefully if a wq can't be allocated?  

NeilBrown> One interesting consequence of this is that silly-rename becomes a
NeilBrown> little more complex.  As the directory may not be exclusively locked,
NeilBrown> the new silly-name needs to be locked (DCACHE_PAR_UPDATE) as well.
NeilBrown> A new LOOKUP_SILLY_RENAME is added which helps implement this using
NeilBrown> common code.

NeilBrown> While testing I found some odd behaviour that was caused by
NeilBrown> d_revalidate() racing with rename().  To resolve this I used
NeilBrown> DCACHE_PAR_UPDATE to ensure they cannot race any more.

NeilBrown> Testing, review, or other comments would be most welcome,