Re: I/O and memory barriers

Pei Lin <telent997@xxxxxxxxx> · Tue, 8 Jun 2010 10:40:32 +0800

2010/6/7 luca ellero <lroluk@xxxxxxxxx>:
> Thanks again for your replay, anyway  I'm still confused. See inline
> comments.
>
> Pei Lin wrote:
>>
>> 2010/5/31 luca ellero <lroluk@xxxxxxxxx>:
>>
>>>
>>> Pei Lin wrote:
>>>
>>>>
>>>> 2010/5/17 luca ellero <lroluk@xxxxxxxxx>:
>>>>
>>>>
>>>>>
>>>>> Hi list,
>>>>> I have some (maybe stupid) questions which I can't answer even after
>>>>> reading
>>>>> lots of documentation.
>>>>> Suppose I have a PCI device which has some I/O registers mapped to
>>>>> memory
>>>>> (here I mean access are made through memory, not I/O space).
>>>>> As far as I know the right way to access them is through functions such
>>>>> as
>>>>> iowrite8 and friends:
>>>>>
>>>>> spin_lock(Q)
>>>>> iowrite8(some_address, ADDR)
>>>>> iowrite8(some_data, DATA);
>>>>> spin_unlock(Q);
>>>>>
>>>>> My questions are:
>>>>>
>>>>> 1) Do I need a write memory barrier (wmb) between the two iowrite8?
>>>>> I think I need it because I've read the implementation of iowrite8 and
>>>>> (in
>>>>> kernel 2.6.30.6) this expands to:
>>>>>
>>>>> void iowrite8(u8 val, void *addr)
>>>>> {
>>>>>  do {
>>>>>     unsigned long port = (unsigned long )addr;
>>>>>     if (port >= 0x40000UL) {
>>>>>         writeb(val, addr);
>>>>>     } else if (port > 0x10000UL) {
>>>>>         port &= 0x0ffffUL;
>>>>>         outb(val,port);
>>>>>     } else bad_io_access(port, "outb(val,port)" );
>>>>>  } while (0);
>>>>> }
>>>>>
>>>>> where writeb is:
>>>>>
>>>>> static inline void writeb(unsigned char val, volatile void *addr) {
>>>>>  asm volatile("movb %0,%1":
>>>>>     :"q" (val), "m" (*(volatile unsigned char *)addr)
>>>>>     :"memory");
>>>>> }
>>>>>
>>>>> which contains only a compiler barrier (the :"memory" in the asm
>>>>> statement)
>>>>> but no CPU barrier. So, without wmb(), CPU can reorder the iowrite8
>>>>> with
>>>>> disastrous effect. Am I right?
>>>>>
>>>>>
>>>>> 2) do I need mmiowb() before spin_unlock()?
>>>>> Documentation about mmiowb() is really confusing me, so any explanation
>>>>> about his use is really welcome.
>>>>>
>>>>>
>>>>
>>>> See the documentation which explains it  clearly.
>>>> http://lxr.linux.no/linux+v2.6.27.46/Documentation/memory-barriers.txt
>>>>
>>>> 1295LOCKS VS I/O ACCESSES
>>>> 1296---------------------
>>>> 1297
>>>> 1298Under certain circumstances (especially involving NUMA), I/O
>>>> accesses
>>>> within
>>>> 1299two spinlocked sections on two different CPUs may be seen as
>>>> interleaved by the
>>>> 1300PCI bridge, because the PCI bridge does not necessarily participate
>>>> in
>>>> the
>>>> 1301cache-coherence protocol, and is therefore incapable of issuing the
>>>> required
>>>> 1302read memory barriers.
>>>> 1303
>>>> 1304For example:
>>>> 1305
>>>> 1306        CPU 1                           CPU 2
>>>> 1307        ===============================
>>>> ===============================
>>>> 1308        spin_lock(Q)
>>>> 1309        writel(0, ADDR)
>>>> 1310        writel(1, DATA);
>>>> 1311        spin_unlock(Q);
>>>> 1312                                        spin_lock(Q);
>>>> 1313                                        writel(4, ADDR);
>>>> 1314                                        writel(5, DATA);
>>>> 1315                                        spin_unlock(Q);
>>>> 1316
>>>> 1317may be seen by the PCI bridge as follows:
>>>> 1318
>>>> 1319        STORE *ADDR = 0, STORE *ADDR = 4, STORE *DATA = 1, STORE
>>>> *DATA
>>>> = 5
>>>> 1320
>>>> 1321which would probably cause the hardware to malfunction.
>>>> 1322
>>>> 1323
>>>> 1324What is necessary here is to intervene with an mmiowb() before
>>>> dropping the
>>>> 1325spinlock, for example:
>>>> 1326
>>>> 1327        CPU 1                           CPU 2
>>>> 1328        ===============================
>>>> ===============================
>>>> 1329        spin_lock(Q)
>>>> 1330        writel(0, ADDR)
>>>> 1331        writel(1, DATA);
>>>> 1332        mmiowb();
>>>> 1333        spin_unlock(Q);
>>>> 1334                                        spin_lock(Q);
>>>> 1335                                        writel(4, ADDR);
>>>> 1336                                        writel(5, DATA);
>>>> 1337                                        mmiowb();
>>>> 1338                                        spin_unlock(Q);
>>>> 1339
>>>> 1340this will ensure that the two stores issued on CPU 1 appear at the
>>>> PCI bridge
>>>> 1341before either of the stores issued on CPU 2.
>>>> 1342
>>>> 1343
>>>> 1344Furthermore, following a store by a load from the same device
>>>> obviates the need
>>>> 1345for the mmiowb(), because the load forces the store to complete
>>>> before the load
>>>> 1346is performed:
>>>> 1347
>>>> 1348        CPU 1                           CPU 2
>>>> 1349        ===============================
>>>> ===============================
>>>> 1350        spin_lock(Q)
>>>> 1351        writel(0, ADDR)
>>>> 1352        a = readl(DATA);
>>>> 1353        spin_unlock(Q);
>>>> 1354                                        spin_lock(Q);
>>>> 1355                                        writel(4, ADDR);
>>>> 1356                                        b = readl(DATA);
>>>> 1357                                        spin_unlock(Q);
>>>> 1358
>>>> 1359
>>>> 1360See Documentation/DocBook/deviceiobook.tmpl for more information.
>>>>
>>>>
>>>
>>> Thanks for your reply,
>>> I've already read the documentation, anyway what surprises me is the fact
>>> that mmiowb() (at least on x86) is defined as a compiler barrier
>>> (barrier())
>>> and nothing else. I would expect it to do something more than that: some
>>> specific PCI command or at least a dummy "read" from some PCI register
>>> (since a read  forces the store to complete).
>>>
>>
>> As for MIPS, it defined as
>> /* Depends on MIPS II instruction set */
>>  #define mmiowb() asm volatile ("sync" ::: "memory") .
>>
>> For X86
>> #define mb()    asm volatile("mfence":::"memory")
>> #define rmb()   asm volatile("lfence":::"memory")
>> #define wmb()   asm volatile("sfence" ::: "memory")
>> For x86,  use the mfence/lfence/sfence pair to guarantee it.
>>
>
> That's not true. I confirm you my previous assertion . On x86, mmiowb
> doesn't use any mfence/lfence/sfence, it's only a compiler barrier:

look at the e-mail i provided.
"Now, on x86, the CPU actually tends to order IO writes *more* than it
orders any other writes (they are mostly entirely synchronous, unless the
area has been marked as write merging), but at least on PPC, it's the
other way around: without the cache as a serialization entry, you end up
having a totally separate queueu to serialize, and a regular-memory write
barrier does nothing at all to the IO queue."
So on X86, mmiowb only define to asm volatile (" " ::: "memory") .
Another word for it, x86  can guarantee the order for IO writes, i
think.

> See arch\x86\include\asm\io.h:
> #define mmiowb() barrier()
>
>
>> i found an old mail discussion for the mmiowb() usage.
>> http://www.gelato.unsw.edu.au/archives/linux-ia64/0708/21056.html
>> http://www.gelato.unsw.edu.au/archives/linux-ia64/0708/21096.html
>> From: Nick Piggin <npiggin_at_suse.de>
>> Date: 2007-08-24 12:59:16
>> On Thu, Aug 23, 2007 at 09:16:42AM -0700, Linus Torvalds wrote:
>>
>>>
>>> On Thu, 23 Aug 2007, Nick Piggin wrote:
>>>
>>>>
>>>> Also, FWIW, there are some advantages of deferring the mmiowb thingy
>>>> until the point of unlock.
>>>>
>>>
>>> And that is exactly what ppc64 does.
>>>
>>> But you're missing a big point: for 99.9% of all hardware, mmiowb() is a
>>> total no-op. So when you talk about "advantages", you're not talking
>>> about
>>> any *real* advantage, are you?
>>>
>>
>>
>>
>>>
>>> Furthermore a lot of PCI drivers seem to ignore its use.
>>> Can you explain me that?
>>>
>>
>> i only got one linker which may explain why many driver removed the
>> mmiowb().
>> http://lwn.net/Articles/283776/
>>
>>
>
> As far as I can see in 2.6.33 code, this patch was not applied in vanilla
> kernel source. So that's not the point.
> Regards
> Luca
>
>
>
>

-- 
Best Regards
Lin

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ