Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput -33.7% regression

Mateusz Guzik <mjguzik@xxxxxxxxx> · Tue, 2 Jul 2024 19:46:39 +0200

On Tue, Jul 2, 2024 at 7:28 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, 2 Jul 2024 at 10:03, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
> >
> > I was thinking a different approach.
> >
> > A lookup variant which resolves everything and returns the dentry + an
> > information whether this is rcu mode.
>
> That would work equally.
>
> But the end result ends up being very similar: you need to hook into
> that final complete_walk() -> try_to_unlazy() -> legitimize_path() and
> check a flag whether you actually then do "get_lockref_or_dead()" or
> not.
>

Ye, the magic routine to validate if you can pretend the ref was taken
would wrap it.

> It really *shouldn't* be too bad, but this is just so subtle code that
> it just takes a lot of care. Even if the patch itself ends up not
> necessarily being very large.
>
> As mentioned, I've looked at it, but it always ended up being _just_
> scary enough that I never really started doing it.
>

I implemented something like this as a demo in FreeBSD few years back,
it did not blow up at least. The work did not get committed though
because I could not be arsed to productize it.

tbf if anything the only shady things here that I see is that stat et
al do their work without any locks held nor seqc verification in
current kernel.

In FreeBSD this was operating directly in vnodes (here one can pretend
it's inodes). In that system I added sequence counters to the vnode
itself and any state change like write, setattr, unlink or whatever
would bump it. Then something like stat could safely read whatever it
wants in a lockless manner with the final check for maching seqc
indicating nothing changed.

Not having a "someone is messing with the inode" indicator (only with
a dentry) in Linux is definitely worrisome when pushing RCU further,
if that's what you meant.

Again, I'm going to poke around if only for kicks when I find the time
and we will see what happens.
-- 
Mateusz Guzik <mjguzik gmail.com>