Re: [REPOST PATCH v4 2/5] kernfs: use VFS negative dentry caching

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Thu, 03 Jun 2021 17:02:28 -0500

Miklos Szeredi <miklos@xxxxxxxxxx> writes:

> On Thu, 3 Jun 2021 at 19:26, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>>
>> Ian Kent <raven@xxxxxxxxxx> writes:
>>
>> > If there are many lookups for non-existent paths these negative lookups
>> > can lead to a lot of overhead during path walks.
>> >
>> > The VFS allows dentries to be created as negative and hashed, and caches
>> > them so they can be used to reduce the fairly high overhead alloc/free
>> > cycle that occurs during these lookups.
>> >
>> > Signed-off-by: Ian Kent <raven@xxxxxxxxxx>
>> > ---
>> >  fs/kernfs/dir.c |   55 +++++++++++++++++++++++++++++++++----------------------
>> >  1 file changed, 33 insertions(+), 22 deletions(-)
>> >
>> > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>> > index 4c69e2af82dac..5151c712f06f5 100644
>> > --- a/fs/kernfs/dir.c
>> > +++ b/fs/kernfs/dir.c
>> > @@ -1037,12 +1037,33 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
>> >       if (flags & LOOKUP_RCU)
>> >               return -ECHILD;
>> >
>> > -     /* Always perform fresh lookup for negatives */
>> > -     if (d_really_is_negative(dentry))
>> > -             goto out_bad_unlocked;
>> > +     mutex_lock(&kernfs_mutex);
>> >
>> >       kn = kernfs_dentry_node(dentry);
>> > -     mutex_lock(&kernfs_mutex);
>>
>> Why bring kernfs_dentry_node inside the mutex?
>>
>> The inode lock of the parent should protect negative to positive
>> transitions not the kernfs_mutex.  So moving the code inside
>> the mutex looks unnecessary and confusing.
>
> Except that d_revalidate() may or may not be called with parent lock
> held.

I grant that this works because kernfs_io_lookup today holds
kernfs_mutex over d_splice_alias.

The problem is that the kernfs_mutex only should be protecting the
kernfs data structures not the vfs data structures.

Reading through the code history that looks like a hold over from when
sysfs lived in the dcache before it was reimplemented as a distributed
file system.  So it was probably a complete over sight and something
that did not matter.

The big problem is that if the code starts depending upon the
kernfs_mutex (or the kernfs_rwsem) to provide semantics the rest of the
filesystems does not the code will diverge from the rest of the
filesystems and maintenance will become much more difficult.

Diverging from other filesystems and becoming a maintenance pain has
already been seen once in the life of sysfs and I don't think we want to
go back there.

Further extending the scope of lock, when the problem is that the
locking is causing problems seems like the opposite of the direction we
want the code to grow.

I really suspect all we want kernfs_dop_revalidate doing for negative
dentries is something as simple as comparing the timestamp of the
negative dentry to the timestamp of the parent dentry, and if the
timestamp has changed perform the lookup.  That is roughly what
nfs does today with negative dentries.

The dentry cache will always lag the kernfs_node data structures, and
that is fundamental.  We should take advantage of that to make the code
as simple and as fast as we can not to perform lots of work that creates
overhead.

Plus the kernfs data structures should not change much so I expect
there will be effectively 0 penalty in always performing the lookup of a
negative dentry when the directory itself has changed.

Eric