> From: Arnd Bergmann <arnd@xxxxxxxx> > Sent: Friday, September 16, 2022 12:09 AM [..] > >> > > I think it is worth because current documentation, indirectly (or > > incorrectly) indicate that > > "writel() does wmb() internally, so those drivers, who has difficulty > > in using writel() can do, wmb() + raw write". > > I don't think it's wrong from a barrier perspective though: > if a driver uses writel_relaxed(), then the only way to guarantee ordering is > to have a full wmb() before it. > Sorry for the late response. Yes. Idea is to avoid wmb() whenever it is not necessary. I will update the example description to reflect it. > > And I sort of see above pattern in two drivers, and it is not good. > > It ends up doing dsb(st) on arm64, while needed barrier is only > > dmb(oshst). > > > > So to fix those two drivers, it is better to first avoid wmb() > > documentation reference when referring to writel(). > > Yes, this suggestion is correct. On x86 and a few others, I think it's even > worse when wmb() is an expensive barrier, while writel() is the same as > writel_relaxed() and the barrier is implied by the MMIO access. > > It might help to spell this out and say that writel() is always preferred over > wmb()+writel_relaxed(). > True. > Site note: there are several other problems with wmb()+__raw_writel(), > which on many architectures does not guarantee any atomicity of the access > (a word store could get split into four byte stores), breaks endianess > assumptions and may still not provide the correct barrier semantics. > Hmm. So far didn't observe this on arm64, x86_64, ppc64 yet. May be because the address is aligned to 8 bytes, we don't see the byte stores? > >> I see that there is more going on with that function, at least the > >> loop in > >> post_send_nop() probably just wants to use __iowrite64_copy(), but > >> that also has no barrier in it, while changing mlx5_write64() to use > >> iowrite64be() or similar would of course add excessive barriers inside of > the loop. > > > > True. All other conversion seems possible. > > For post_send_nop(), __iowmb() needs to be exposed, which is not > > available today and it is only one-off user, I am inclined to keep > > post_send_nop() as-is, but want to improve/correct rest of the > > callers in these two drivers. > > __iowmb() is architecture-specific and does not have a well-defined > behavior. wmb() is probably the best choice for post_send_nop(). Yes. > Alternatively, one could use __iowrite64_copy() for the first few fields > followed by a single writel64be for the last one. > __iowrite64_copy() () seems right fit for post_send_nop() compare t current code. > If you think we need something better than that, maybe having an > iowrite64_copy() (without leading __) that includes a barrier would work. It is only one-off user, and not so critical path, so we can differ iowrite64_copy() for now. mlx5_write64() variant to use writeX() and avoid wmb() post the documentation update is good start.