On Mon, 09 Sep 2024, Jeff Layton wrote: > On Mon, 2024-09-09 at 15:06 +1000, NeilBrown wrote: > > The pair of bloom filtered used by delegation_blocked() was intended to > > block delegations on given filehandles for between 30 and 60 seconds. A > > new filehandle would be recorded in the "new" bit set. That would then > > be switch to the "old" bit set between 0 and 30 seconds later, and it > > would remain as the "old" bit set for 30 seconds. > > > > Since we're on the subject... > > 60s seems like an awfully long time to block delegations on an inode. > Recalls generally don't take more than a few seconds when things are > functioning properly. > > Should we swap the bloom filters more often? Or should we take a completely different approach? Or both? I'm bothered by the fact that this bug in a heuristic caused rename to misbehave this way. So I did some exploration. try_break_deleg / break_deleg_wait are called: - notify_change() - chmod_common, chown_common, and vfs_utimes waits - vfs_unlink() - do_unlinkat waits for the delegation to be broken - vfs_link() - do_linkat waits - vfs_rename() - do_renameat2 waits - vfs_set_acl() - waits itself - vfs_remove_acl() - ditto - __vfs_setxattr_locked() - vfs_setxattr waits - __vfs_removexattr_locked() - vfs_removexattr waits I would argue that *none* of these justify delaying further delegations once the operation itself has completed. They aren't the sort of things that suggest on-going cross-client "sharing" which delegations interfere with. I imagine try_break_lease would increment some counter in ->i_flctx which prevents further leases being taken, and some new call near where break_deleg_wait() is called will decrement that counter. If waiting for a lease to break, call break_deleg_wait() and retry, else break_deleg_done(). Delegations are also broken by calls break_lease() which comes from nfsd_open_break_lease() for conflicting nfsd opens. I think there *are* indicitive of shares and *should* result in the filehandle being recorded in the bloom filter. Maybe if break_lease() in nfsd_open_break_lease() returns -EWOULDBLOCK, then the filehandle could be added to the bloom filter at that point. do_dentry_open also calls break_lease() but we still want the filehandle to go in the bloom-filter in that case .... maybe the lm_break callback could check i_writecount and i_readcount and use those to determine if delaying delegations is appropriate. wrt the time to block delegation for, I'm not sure that it matters much. Once we focus the delay on files that are concurrently opened from multiple clients (not both read-only) I think there are some files (most) that are never shared, and some files that are often shared. We probably don't want delegations for often shared files at all. So I'd be comfortable blocking delegations on often-shared files for quite a while. I wouldn't advocate for *longer* than 30 seconds because I don't want the bloom filters to fill up - that significantly reduced their usefulness. So maybe we could also swap filters when the 'new' one becomes 50% full.... NeilBrown