On 9/15/2023 8:21 AM, Serge Semin wrote:
...
Based on the patch log and the comment, smp_wmb() seems to be more
suitable here since the problem looks like SMP-specific. Most
importantly the smp_wmb() will get to be just the compiler barrier on
the UP system, so no cache and pipeline flushes in that case.
Meanwhile
I am not ARM expert, but based on the problem and the DMB/DSB barriers
descriptions using DMB should be enough in your case since you only
need memory syncs.
Hi Serge,
I looked at the definition of smp_wmb, and it looks like on arm64 it
uses a DMB barrier not a DSB barrier.
In /arch/arm64/include/asm/barrier.h:
...
#define __arm_heavy_mb(x...) dsb(x)
...
#if defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
...
#define wmb() __arm_heavy_mb(st)
...
#define __smp_wmb() dmb(ishst)
And then in /include/asm-generic/barrier.h it says:
#ifdef CONFIG_SMP
...
#ifndef smp_wmb
#define smp_wmb() do { kcsan_wmb(); __smp_wmb(); } while (0)
#endif
This looks like wmb() is a DSB and smp_wmb() is a DMB on SMP systems, so
the two functions are not equivalent on SMP systems.
So lets explore if we think DMB or DSB is the correct barrier.
The ARM barrier docs I referred to has a specific example that says this:
"In some message passing systems, it is common for one observer to
update memory and then send an interrupt using a mailbox of some sort to
a second observer to indicate that memory has been updated and the new
contents have been read. Even though the sending of the interrupt using
a mailbox might be initiated using a memory access, a DSB barrier
must be used to ensure the completion of previous memory accesses.
Therefore the following sequence is needed to ensure that P2 sees the
updated value.
P1:
STR R5, [R1] ; message stored to shared memory location
DSB [ST]
STR R1, [R4] ; R4 contains the address of a mailbox
P2:
; interrupt service routine
LDR R5, [R1]
Even if R4 is a pointer to Strongly-Ordered memory, the update to R1
might not be visible without the DSB executed by P1.
It should be appreciated that these rules are required in connection to
the ARM Generic Interrupt Controller (GIC).
"
I don't positivly understand why it needs to be a DSB and not just a
DMB, but this example matches what happens in the driver. The ARM docs
do some hand waving that DSB is required because of the GIC.
Unless we can come up with a reason why this example in the ARM Barrier
docs is not a match for what happens in the i2c driver, then ARM is
saying it has to be a DSB not a DMB. If it needs to be a DSB then
smb_wmb is insufficient.
Does anybody else have a different interpretation of this section in the
ARM barrier docs? They use the word mailbox, and show a shared memory
write, an interrupt triggering write, and a read of shared memory on a
different core. Some would describe that as a software mailbox.
I did read someplace (although don't have a specific reference I can
give) that ordering applied to normal memory writes are in a different
group than ordering applied between strongly ordered accesses. The
excerpt from the ARM barrier document above does say "Even if R4 is a
pointer to Strongly-Ordered memory, the update to R1 might not be
visible without the DSB executed by P1", which implies a DMB is
insufficient to cause ordering between normal memory writes and
strongly-ordered device memory writes.
I know currently on ARM64 Windows, the low-level kernel device MMIO
access functions (like WRITE_REGISTER_ULONG) all have a DSB before the
MMIO memory access. That seems a little heavy handed to me, but it also
may be that was required to get all the current driver code written for
AMD/Intel processors to work correctly on ARM64 without adding barriers
in the drivers. There are also non-barrier variants that can be used if
a driver wants to optimize performance. Defaulting to correct operation
with minimal code changes would reduce the risk to delivery schedules.
Linux doesn't seem to make any attempt to have barriers in the low level
MMIO access functions. If Linux had chosen to do that on ARM64, this
patch would not have been required. For a low speed device like an i2c
controller, optimizing barriers likely make little difference in
performance.
Let's look at it from a risk analysis viewpoint. Say a DMB is sufficient
and we use the stronger DSB variant, the downside is a few cpu cycles
will be wasted in i2c transfers. Say we use a DMB when a DSB is required
for correct operation, the downside is i2c operations may malfunction.
In this case, using a few extra cpu cycles for an operation that does
not happen at high frequency is lower risk than failures in i2c
transfers. If there is any uncertainty in what barrier type to use,
picking DSB over DMB would be better. We determined from the include
fragments above that wmb() give the DSB and smp_wmb() does not.
Based on the above info, I think wmb() is still the correct function,
and a change to smp_wmb() would not be correct.
Sorry for the long message, I know some of you will be inspired to think
deeply about barriers, and some will be annoyed that I spent this much
space to explain how I came to the choice of wmb().
Thanks,
Jan