On Sat, Oct 05, 2024 at 02:42:25PM GMT, Linus Torvalds wrote: > On Sat, 5 Oct 2024 at 12:17, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > Note that atomic_inc_not_zero() contained a full memory barrier that we > > relied upon. But we only need an acquire barrier and so I replaced the > > second load from the file table with a smp_load_acquire(). I'm not > > completely sure this is correct or if we could get away with something > > else. Linus, maybe you have input here? > > I don't think this is valid. > > You go from this: > > file = rcu_dereference_raw(*f); > if (!file) > return NULL; > if (unlikely(!atomic_long_inc_not_zero(&file->f_count))) > return ERR_PTR(-EAGAIN); > file_reloaded = rcu_dereference_raw(*f); > > to this: > > file = rcu_dereference_raw(*f); > if (!file) > return NULL; > if (unlikely(!rcuref_long_get(&file->f_count))) > return ERR_PTR(-EAGAIN); > file_reloaded = smp_load_acquire(f); > > and the thing is, that rcuref_long_get() had better be a *full* memory barrier. > > The smp_load_acquire() does absolutely nothing: it means that the load > will be done before *subsequent* memory operations. But it is not a > barrier to preceding memory operations. Right, because we need the increment to be ordered against the second load not the two loads against each other. > > So if rcuref_long_get() isn't ordered (and you made it relaxed, the > same way the existing rcuref_get() is), the CPU can basically move > that down past the smp_load_acquire(), so the code actually > effectively turns into this: > > file = rcu_dereference_raw(*f); > if (!file) > return NULL; > file_reloaded = smp_load_acquire(f); > if (unlikely(!rcuref_long_get(&file->f_count))) > return ERR_PTR(-EAGAIN); > > and now the "file_reloaded" thing is completely pointless. > > Of course, you will never *see* this on x86, because all atomics are > always full memory barriers on x86, so when you test this all on that > noce 256-thread CPU, it all works fine. Because on x86, the "relaxed" > memory ordering isn't. Yeah, I was aware of that but I don't have a 64bit arm box. It would be nice for testing anyway. > > So no. You can't use atomic_long_add_negative_relaxed() in the file > handling, because it really does require a proper memory barrier. > > Now, I do think that replacing the cmpxchg loop with > atomic_long_add_negative() is a good idea. > > But you can't do it this way, and you can't use the RCUREF logic and > just extend it to the file ref. > > I would suggest you take that whole "rcuref_long_get()" code and *not* > try to make it look like the rcuref code, but instead make it > explicitly just about the file counting. Because the file counting > really does have these special rules. > > Also, honestly, the only reason the file counting is using a "long" is > because the code does *NOT* do overflow checking. But once you start > looking at the sign and do conditional increments, you can actually > just make the whole refcount be a "int" instead, and make "struct > file" potentially smaller. I've already shrunk it down to three cachelines. > > And yes, that requires that people who get a file() will fail when the > file count goes negative, but that's *good*. Yeah, the overflow protection would be neat to have.