Re: [PATCH 07/22] shrink_dentry_list(): no need to check that dentry refcount is marked dead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 09, 2023 at 02:53:04PM +0100, Christian Brauner wrote:
> On Thu, Nov 09, 2023 at 06:20:41AM +0000, Al Viro wrote:
> > ... we won't see DCACHE_MAY_FREE on anything that is *not* dead
> > and checking d_flags is just as cheap as checking refcount.
> > 
> > Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> > ---
> 
> Could also be a WARN_ON_ONCE() on d_lockref.count > 0 if DCACHE_MAY_FREE
> is set but probably doesn't matter,

>= 0, actually, but... TBH, in longer run I would rather represent the
empty husk state (instance just waiting for shrink_dentry_list() to remove
it from its list and free the sucker) not by a bit in ->d_flags, but
by a specific negative value in ->d_lockref.count.

After this series we have the following picture: all real instances come
from __alloc_dentry().  Possible states after that
  Busy <-> Retained -> Dying -> Freeing
                         |        ^
			 V        |
			 Husk ----/

Busy and Retained are live dentries, with positive and zero refcount
resp.; that's the pure refcounting land.  Eventually we get to
succesful lock_for_kill(), which leads to call of __dentry_kill().
That's where the state becomes Dying.  On the way out of __dentry_kill()
(after ->d_prune()/->d_iput()/->d_release()) we either switch to Freeing
(only RCU references remain, actual memory object freed by the end of it)
or Husk (the only non-RCU reference is that of a shrink list it's on).
Husk, in turn, switches to Freeing as soon as shrink_dentry_list() gets
around to it and takes it out of its shrink list.  If shrink_dentry_list()
picks an instance in Dying state, it quietly removes it from the shrink
list and leaves it for __dentry_kill() to deal with.

All transitions are under ->d_lock.  ->d_lockref.count for those is
positive in Busy, zero in Retained and -128 in Dying, Husk and Freeing.
Husk is distinguished by having DCACHE_MAY_FREE set.  Freeing has no
visible difference from Dying.

All refcount changes are under ->d_lock.  None of them should _ever_
change the negative values.  If the last part is easy to verify (right
now it's up to "no refcount overflows, all callers of dget_dlock() are
guaranteed to be dealing with Busy or Retained instances"), it might
make sense to use 3 specific negative values for Dying/Husk/Freeing.
What's more, it might make sense to deal with overflows by adding a
separate unsigned long __d_large_count_dont_you_ever_touch_that_directly;
and have the overflow switch to the 4th special negative number indicating
that real counter sits in there.

I'm not 100% convinced that this is the right way to handle that mess,
but it's an approach I'm at least keeping in mind.  Anyway, we need to
get the damn thing documented and understandable before dealing with
overflows becomes even remotely possible.  As it is, it's way too
subtle and reasoning about correctness is too convoluted and brittle.

PS: "documented" includes explicit description of states, their
representations and transitions between them, as well as the objects
associated with the instance in each of those, what references
are allowed in each state, etc.  And the things like in-lookup,
cursor, etc. - live dentries have sub-states as well...




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux