Re: [PATCH RFC 0/4] fs: port files to rcuref_long_t

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Oct 05, 2024 at 02:42:25PM GMT, Linus Torvalds wrote:
> On Sat, 5 Oct 2024 at 12:17, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> >
> > Note that atomic_inc_not_zero() contained a full memory barrier that we
> > relied upon. But we only need an acquire barrier and so I replaced the
> > second load from the file table with a smp_load_acquire(). I'm not
> > completely sure this is correct or if we could get away with something
> > else. Linus, maybe you have input here?
> 
> I don't think this is valid.
> 
> You go from  this:
> 
>         file = rcu_dereference_raw(*f);
>         if (!file)
>                 return NULL;
>         if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
>                 return ERR_PTR(-EAGAIN);
>         file_reloaded = rcu_dereference_raw(*f);
> 
> to this:
> 
>         file = rcu_dereference_raw(*f);
>         if (!file)
>                 return NULL;
>         if (unlikely(!rcuref_long_get(&file->f_count)))
>                 return ERR_PTR(-EAGAIN);
>         file_reloaded = smp_load_acquire(f);
> 
> and the thing is, that rcuref_long_get() had better be a *full* memory barrier.
> 
> The smp_load_acquire() does absolutely nothing: it means that the load
> will be done before *subsequent* memory operations. But it is not a
> barrier to preceding memory operations.

Right, because we need the increment to be ordered against the second
load not the two loads against each other.

> 
> So if rcuref_long_get() isn't ordered (and you made it relaxed, the
> same way the existing rcuref_get() is), the CPU can basically move
> that down past the smp_load_acquire(), so the code actually
> effectively turns into this:
> 
>         file = rcu_dereference_raw(*f);
>         if (!file)
>                 return NULL;
>         file_reloaded = smp_load_acquire(f);
>         if (unlikely(!rcuref_long_get(&file->f_count)))
>                 return ERR_PTR(-EAGAIN);
> 
> and now the "file_reloaded" thing is completely pointless.
> 
> Of course, you will never *see* this on x86, because all atomics are
> always full memory barriers on x86, so when you test this all on that
> noce 256-thread CPU, it all works fine. Because on x86, the "relaxed"
> memory ordering isn't.

Yeah, I was aware of that but I don't have a 64bit arm box. It would be
nice for testing anyway.

> 
> So no. You can't use atomic_long_add_negative_relaxed() in the file
> handling, because it really does require a proper memory barrier.
> 
> Now, I do think that replacing the cmpxchg loop with
> atomic_long_add_negative() is a good idea.
> 
> But you can't do it this way, and you can't use the RCUREF logic and
> just extend it to the file ref.
> 
> I would suggest you take that whole "rcuref_long_get()" code and *not*
> try to make it look like the rcuref code, but instead make it
> explicitly just about the file counting. Because the file counting
> really does have these special rules.
> 
> Also, honestly, the only reason the file counting is using a "long" is
> because the code does *NOT* do overflow checking. But once you start
> looking at the sign and do conditional increments, you can actually
> just make the whole refcount be a "int" instead, and make "struct
> file" potentially smaller.

I've already shrunk it down to three cachelines.

> 
> And yes, that requires that people who get a file() will fail when the
> file count goes negative, but that's *good*.

Yeah, the overflow protection would be neat to have.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux