Re: I/O and memory barriers

Pei Lin <telent997@xxxxxxxxx> · Sat, 5 Jun 2010 23:32:51 +0800



2010/5/31 luca ellero <lroluk@xxxxxxxxx>:
> Pei Lin wrote:
>>
>> 2010/5/17 luca ellero <lroluk@xxxxxxxxx>:
>>
>>>
>>> Hi list,
>>> I have some (maybe stupid) questions which I can't answer even after
>>> reading
>>> lots of documentation.
>>> Suppose I have a PCI device which has some I/O registers mapped to memory
>>> (here I mean access are made through memory, not I/O space).
>>> As far as I know the right way to access them is through functions such
>>> as
>>> iowrite8 and friends:
>>>
>>> spin_lock(Q)
>>> iowrite8(some_address, ADDR)
>>> iowrite8(some_data, DATA);
>>> spin_unlock(Q);
>>>
>>> My questions are:
>>>
>>> 1) Do I need a write memory barrier (wmb) between the two iowrite8?
>>> I think I need it because I've read the implementation of iowrite8 and
>>> (in
>>> kernel 2.6.30.6) this expands to:
>>>
>>> void iowrite8(u8 val, void *addr)
>>> {
>>>  do {
>>>      unsigned long port = (unsigned long )addr;
>>>      if (port >= 0x40000UL) {
>>>          writeb(val, addr);
>>>      } else if (port > 0x10000UL) {
>>>          port &= 0x0ffffUL;
>>>          outb(val,port);
>>>      } else bad_io_access(port, "outb(val,port)" );
>>>  } while (0);
>>> }
>>>
>>> where writeb is:
>>>
>>> static inline void writeb(unsigned char val, volatile void *addr) {
>>>  asm volatile("movb %0,%1":
>>>      :"q" (val), "m" (*(volatile unsigned char *)addr)
>>>      :"memory");
>>> }
>>>
>>> which contains only a compiler barrier (the :"memory" in the asm
>>> statement)
>>> but no CPU barrier. So, without wmb(), CPU can reorder the iowrite8 with
>>> disastrous effect. Am I right?
>>>
>>>
>>> 2) do I need mmiowb() before spin_unlock()?
>>> Documentation about mmiowb() is really confusing me, so any explanation
>>> about his use is really welcome.
>>>
>>
>> See the documentation which explains it  clearly.
>> http://lxr.linux.no/linux+v2.6.27.46/Documentation/memory-barriers.txt
>>
>> 1295LOCKS VS I/O ACCESSES
>> 1296---------------------
>> 1297
>> 1298Under certain circumstances (especially involving NUMA), I/O accesses
>> within
>> 1299two spinlocked sections on two different CPUs may be seen as
>> interleaved by the
>> 1300PCI bridge, because the PCI bridge does not necessarily participate in
>> the
>> 1301cache-coherence protocol, and is therefore incapable of issuing the
>> required
>> 1302read memory barriers.
>> 1303
>> 1304For example:
>> 1305
>> 1306        CPU 1                           CPU 2
>> 1307        ===============================
>> ===============================
>> 1308        spin_lock(Q)
>> 1309        writel(0, ADDR)
>> 1310        writel(1, DATA);
>> 1311        spin_unlock(Q);
>> 1312                                        spin_lock(Q);
>> 1313                                        writel(4, ADDR);
>> 1314                                        writel(5, DATA);
>> 1315                                        spin_unlock(Q);
>> 1316
>> 1317may be seen by the PCI bridge as follows:
>> 1318
>> 1319        STORE *ADDR = 0, STORE *ADDR = 4, STORE *DATA = 1, STORE *DATA
>> = 5
>> 1320
>> 1321which would probably cause the hardware to malfunction.
>> 1322
>> 1323
>> 1324What is necessary here is to intervene with an mmiowb() before
>> dropping the
>> 1325spinlock, for example:
>> 1326
>> 1327        CPU 1                           CPU 2
>> 1328        ===============================
>> ===============================
>> 1329        spin_lock(Q)
>> 1330        writel(0, ADDR)
>> 1331        writel(1, DATA);
>> 1332        mmiowb();
>> 1333        spin_unlock(Q);
>> 1334                                        spin_lock(Q);
>> 1335                                        writel(4, ADDR);
>> 1336                                        writel(5, DATA);
>> 1337                                        mmiowb();
>> 1338                                        spin_unlock(Q);
>> 1339
>> 1340this will ensure that the two stores issued on CPU 1 appear at the
>> PCI bridge
>> 1341before either of the stores issued on CPU 2.
>> 1342
>> 1343
>> 1344Furthermore, following a store by a load from the same device
>> obviates the need
>> 1345for the mmiowb(), because the load forces the store to complete
>> before the load
>> 1346is performed:
>> 1347
>> 1348        CPU 1                           CPU 2
>> 1349        ===============================
>> ===============================
>> 1350        spin_lock(Q)
>> 1351        writel(0, ADDR)
>> 1352        a = readl(DATA);
>> 1353        spin_unlock(Q);
>> 1354                                        spin_lock(Q);
>> 1355                                        writel(4, ADDR);
>> 1356                                        b = readl(DATA);
>> 1357                                        spin_unlock(Q);
>> 1358
>> 1359
>> 1360See Documentation/DocBook/deviceiobook.tmpl for more information.
>>
>
> Thanks for your reply,
> I've already read the documentation, anyway what surprises me is the fact
> that mmiowb() (at least on x86) is defined as a compiler barrier (barrier())
> and nothing else. I would expect it to do something more than that: some
> specific PCI command or at least a dummy "read" from some PCI register
> (since a read  forces the store to complete).
As for MIPS, it defined as
/* Depends on MIPS II instruction set */
 #define mmiowb() asm volatile ("sync" ::: "memory") .

For X86
#define mb()    asm volatile("mfence":::"memory")
#define rmb()   asm volatile("lfence":::"memory")
#define wmb()   asm volatile("sfence" ::: "memory")
For x86,  use the mfence/lfence/sfence pair to guarantee it.
i found an old mail discussion for the mmiowb() usage.
http://www.gelato.unsw.edu.au/archives/linux-ia64/0708/21056.html
http://www.gelato.unsw.edu.au/archives/linux-ia64/0708/21096.html
From: Nick Piggin <npiggin_at_suse.de>
Date: 2007-08-24 12:59:16
On Thu, Aug 23, 2007 at 09:16:42AM -0700, Linus Torvalds wrote:
>
>
> On Thu, 23 Aug 2007, Nick Piggin wrote:
> >
> > Also, FWIW, there are some advantages of deferring the mmiowb thingy
> > until the point of unlock.
>
> And that is exactly what ppc64 does.
>
> But you're missing a big point: for 99.9% of all hardware, mmiowb() is a
> total no-op. So when you talk about "advantages", you're not talking about
> any *real* advantage, are you?


> Furthermore a lot of PCI drivers seem to ignore its use.
> Can you explain me that?
i only got one linker which may explain why many driver removed the mmiowb().
http://lwn.net/Articles/283776/

> Luca
>
>
>


-- 
Best Regards
Lin

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ