On Thu, Sep 15, 2022, at 6:35 PM, Parav Pandit wrote: >> From: Arnd Bergmann <arnd@xxxxxxxx> >> Sent: Thursday, September 15, 2022 11:16 AM >> On Thu, Sep 15, 2022, at 4:18 PM, Parav Pandit wrote: >> > >> > So more accurate documentation is to say that 'when using writel() a >> > prior IO barrier is not needed ...' >> > >> > How about that? >> >> That's probably fine, not sure if it's worth changing. >> > I think it is worth because current documentation, indirectly (or > incorrectly) indicate that > "writel() does wmb() internally, so those drivers, who has difficulty > in using writel() can do, wmb() + raw write". I don't think it's wrong from a barrier perspective though: if a driver uses writel_relaxed(), then the only way to guarantee ordering is to have a full wmb() before it. > And I sort of see above pattern in two drivers, and it is not good. > It ends up doing dsb(st) on arm64, while needed barrier is only > dmb(oshst). > > So to fix those two drivers, it is better to first avoid wmb() > documentation reference when referring to writel(). Yes, this suggestion is correct. On x86 and a few others, I think it's even worse when wmb() is an expensive barrier, while writel() is the same as writel_relaxed() and the barrier is implied by the MMIO access. It might help to spell this out and say that writel() is always preferred over wmb()+writel_relaxed(). Site note: there are several other problems with wmb()+__raw_writel(), which on many architectures does not guarantee any atomicity of the access (a word store could get split into four byte stores), breaks endianess assumptions and may still not provide the correct barrier semantics. >> I see that there is more going on with that function, at least the loop in >> post_send_nop() probably just wants to use __iowrite64_copy(), but that >> also has no barrier in it, while changing mlx5_write64() to use iowrite64be() >> or similar would of course add excessive barriers inside of the loop. > > True. All other conversion seems possible. > For post_send_nop(), __iowmb() needs to be exposed, which is not > available today and it is only one-off user, > I am inclined to keep post_send_nop() as-is, but want to > improve/correct rest of the callers in these two drivers. __iowmb() is architecture-specific and does not have a well-defined behavior. wmb() is probably the best choice for post_send_nop(). Alternatively, one could use __iowrite64_copy() for the first few fields followed by a single writel64be for the last one. If you think we need something better than that, maybe having an iowrite64_copy() (without leading __) that includes a barrier would work. Arnd