Hi Florian,
Very appreciate your explanation in detail.
Regards,
Peter
On 02/02/2015 03:25 PM, Florian Fainelli wrote:
On 02/02/15 14:06, Peter Oh wrote:
On 02/02/2015 11:47 AM, Johannes Berg wrote:
On Mon, 2015-02-02 at 11:36 -0800, Peter Oh wrote:
On 02/02/2015 11:22 AM, Johannes Berg wrote:
You basically have the following sequence:
iowrite()
ioread()
If you look, you'll see that iowrite() is actually done (or should
be,
or perhaps with appropriate syncs) on an uncached mapping.
since it's mmio, iowrite will be map to write, not out which is
cached
mapping.
That's why we address "posted write" here.
If it's un-cached mapping which is volatile, we don't even need
ioread.
No, this isn't true - "posted write" in the context of this discussion
is about the PCIe bus. Memory writes that go through cache aren't
referred to as "posted writes", those are just (cached) memory writes.
As a result,
the only thing you care about here is the PCIe bus, not the CPU
cache
flush. And from there on that's just a question of PCIe bus
semantics.
So how does ioread guarantee PCIe bus transaction done?
That's how PCIe works, operations are serialized, and read() has to
wait
for a response from the device
do you know which mechanism or which instruction set makes read() wait
for a response from the device?
I have no idea. I assume it's just like a DRAM read, the CPU stalls
while there's no response.
My explanation in this thread is all about how read() guarantees the
wait for a response from the device, therefore why mb() - replace from
wmb at patch set 2 - is compatible to read().
Briefly speaking,
read() -> dsb 'st' -> cpu (actually axi master in cpu) holding axi bus
-> cpu post write buffer on axi bus -> axi bus (axi slave which is PCIe
device) signals write completion when write transactions completed in
write response channel -> cpu release axi bus -> cpu program counter
(pc) proceeds the next to read.
the exact same routines happen with mb().
mb() -> dsb 'st' -> cpu (actually axi master in cpu) holding axi bus ->
cpu post write buffer on axi bus -> axi bus (axi slave which is PCIe
device) signals write completion when write transactions completed in
write response channel -> cpu release axi bus -> cpu program counter
(pc) proceeds the next to read.
Since axi bus master is waiting (blocking) for write completion signal
from axi slave (PCIe device), this is how read() and mb() guarantee
write command reaches to the device.
PCIe writes are posted, so the only guarantee you can have by inserting
such barriers is that writes from CPU to the PCIe RC (targeting PCIe
device) is non-posted (as far as the busing between CPU and the PCIe RC
is concerned), but past the PCIe RC, there is no such guarantee, because
the PCIe specification allows for that and there is flow control, PCIe
switches or other things that can alter the way your PCIe device ends-up
being written to.
The only way to make a "portable" synchronization barrier is to do a
PCIe read from the same register you just wrote to, because then, the
PCIe RC needs to guarantee the transaction ordering on the PCIe bus itself.
You might just be lucky and/or have very good HW which ensures that the
ARM synchronization barriers are propagated to the memory region where
your PCIe device BARs are mapped from the CPU perspective, but you
definitively cannot rely on such assumptions, as there will be bogus HW
there, for which only a subsequent ioread32() will work.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html