RE: [PATCH] locking/memory-barriers.txt: Improve documentation for writel() usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Arnd Bergmann <arnd@xxxxxxxx>
> Sent: Friday, September 16, 2022 12:09 AM

[..]
> >>
> > I think it is worth because current documentation, indirectly (or
> > incorrectly) indicate that
> > "writel() does wmb() internally, so those drivers, who has difficulty
> > in using writel() can do, wmb() + raw write".
> 
> I don't think it's wrong from a barrier perspective though:
> if a driver uses writel_relaxed(), then the only way to guarantee ordering is
> to have a full wmb() before it.
> 
Sorry for the late response.

Yes. Idea is to avoid wmb() whenever it is not necessary.

I will update the example description to reflect it.

> > And I sort of see above pattern in two drivers, and it is not good.
> > It ends up doing dsb(st) on arm64, while needed barrier is only
> > dmb(oshst).
> >
> > So to fix those two drivers, it is better to first avoid wmb()
> > documentation reference when referring to writel().
> 
> Yes, this suggestion is correct. On x86 and a few others, I think it's even
> worse when wmb() is an expensive barrier, while writel() is the same as
> writel_relaxed() and the barrier is implied by the MMIO access.
> 
> It might help to spell this out and say that writel() is always preferred over
> wmb()+writel_relaxed().
> 
True.

> Site note: there are several other problems with wmb()+__raw_writel(),
> which on many architectures does not guarantee any atomicity of the access
> (a word store could get split into four byte stores), breaks endianess
> assumptions and may still not provide the correct barrier semantics.
>
Hmm. So far didn't observe this on arm64, x86_64, ppc64 yet.
May be because the address is aligned to 8 bytes, we don't see the byte stores?
 
> >> I see that there is more going on with that function, at least the
> >> loop in
> >> post_send_nop() probably just wants to use __iowrite64_copy(), but
> >> that also has no barrier in it, while changing mlx5_write64() to use
> >> iowrite64be() or similar would of course add excessive barriers inside of
> the loop.
> >
> > True. All other conversion seems possible.
> > For post_send_nop(), __iowmb() needs to be exposed, which is not
> > available today and it is only one-off user, I am inclined to keep
> > post_send_nop()  as-is, but want to improve/correct rest of the
> > callers in these two drivers.
> 
> __iowmb() is architecture-specific and does not have a well-defined
> behavior. wmb() is probably the best choice for post_send_nop().
Yes.

> Alternatively, one could use __iowrite64_copy() for the first few fields
> followed by a single writel64be for the last one.
> 
__iowrite64_copy() () seems right fit for post_send_nop() compare t current code.

> If you think we need something better than that, maybe having an
> iowrite64_copy() (without leading __) that includes a barrier would work.

It is only one-off user, and not so critical path, so we can differ iowrite64_copy() for now.

mlx5_write64() variant to use writeX() and avoid wmb() post the documentation update is good start.




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux