On Wed, Mar 12, 2025 at 1:29 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 3/11/25 23:24, Alexei Starovoitov wrote: > > On Tue, Mar 11, 2025 at 9:21 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: > >> > >> On 3/11/25 17:31, Mateusz Guzik wrote: > >> > On Tue, Mar 11, 2025 at 5:21 PM Sebastian Andrzej Siewior > >> > <bigeasy@xxxxxxxxxxxxx> wrote: > >> >> > >> >> On 2025-03-11 16:44:30 [+0100], Mateusz Guzik wrote: > >> >> > On Fri, Feb 21, 2025 at 06:44:22PM -0800, Alexei Starovoitov wrote: > >> >> > > +#define __localtry_lock(lock) \ > >> >> > > + do { \ > >> >> > > + localtry_lock_t *lt; \ > >> >> > > + preempt_disable(); \ > >> >> > > + lt = this_cpu_ptr(lock); \ > >> >> > > + local_lock_acquire(<->llock); \ > >> >> > > + WRITE_ONCE(lt->acquired, 1); \ > >> >> > > + } while (0) > >> >> > > >> >> > I think these need compiler barriers. > >> >> > > >> >> > I checked with gcc docs (https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html) > >> >> > and found this as confirmation: > >> >> > > Accesses to non-volatile objects are not ordered with respect to volatile accesses. > >> >> > > >> >> > Unless the Linux kernel is built with some magic to render this moot(?). > >> >> > >> >> You say we need a barrier() after the WRITE_ONCE()? If so, we need it in > >> >> the whole file… > >> >> > >> > > >> > I see the original local_lock machinery on the stock kernel works fine > >> > as it expands to the preempt pair which has the appropriate fences. If > >> > debug is added, the "locking" remains unaffected, but the debug state > >> > might be bogus when looked at from the "wrong" context and adding the > >> > compiler fences would trivially sort it out. I don't think it's a big > >> > deal for *their* case, but patching that up should not raise any > >> > eyebrows and may prevent eyebrows from going up later. > >> > > >> > The machinery added in this patch does need the addition for > >> > correctness in the base operation though. > >> > >> Yeah my version of this kind of lock in sheaves code had those barrier()'s, > >> IIRC after you or Jann told me. It's needed so that the *compiler* does not > >> e.g. reorder a write to the protected data to happen before the > >> WRITE_ONCE(lt->acquired, 1) (or after the WRITE_ONCE(lt->acquired, 0) in > >> unlock). > > > > I think you all are missing a fine print in gcc doc: > > "Unless...can be aliased". > > The kernel is compiled with -fno-strict-aliasing. > > No need for barrier()s here. > > Note I know next to nothing about these things, but I see here [1]: > > "Whether GCC actually performs type-based aliasing analysis depends on the > details of the code. GCC has other ways to determine (in some cases) whether > objects alias, and if it gets a reliable answer that way, it won’t fall back > on type-based heuristics. [...] You can turn off type-based aliasing > analysis by giving GCC the option -fno-strict-aliasing." > > I'd read that as -fno-strict-aliasing only disables TBAA, but that does not > necessary mean anything can be assumed to be aliased with anything? That's correct. > An if we e.g. have a pointer to memcg_stock_pcp through which we access the > stock_lock an the other (protected) fields and that pointer doesn't change > between that, I imagine gcc can reliably determine these can't alias? Though my last gcc commit was very long ago here is a simple example where compiler can reorder/combine stores: struct s { short a, b; } *p; p->a = 1; p->b = 2; The compiler can keep them as-is, combine or reorder even with -fno-strict-aliasing, because it can determine that a and b don't alias. But after re-reading gcc doc on volatiles again it's clear that extra barriers are not necessary. The main part: "The minimum requirement is that at a sequence point all previous accesses to volatile objects have stabilized" So anything after WRITE_ONCE(lt->acquired, 1); will not be hoisted up and that's what we care about here.