On 3/24/2018 10:30 AM, Chopra, Manish wrote: >> -----Original Message----- >> From: Sinan Kaya [mailto:okaya@xxxxxxxxxxxxxx] >> Sent: Friday, March 23, 2018 10:44 PM >> To: David Miller <davem@xxxxxxxxxxxxx> >> Cc: netdev@xxxxxxxxxxxxxxx; timur@xxxxxxxxxxxxxx; sulrich@xxxxxxxxxxxxxx; >> linux-arm-msm@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; Elior, >> Ariel <Ariel.Elior@xxxxxxxxxx>; Dept-Eng Everest Linux L2 <Dept- >> EngEverestLinuxL2@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx >> Subject: Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly- >> ordered archs >> >> On 3/23/2018 1:04 PM, David Miller wrote: >>> From: Sinan Kaya <okaya@xxxxxxxxxxxxxx> >>> Date: Fri, 23 Mar 2018 12:51:47 -0400 >>> >>>> It could if txdata->tx_db was not a union. There is a data dependency >>>> between txdata->tx_db.data.prod and txdata->tx_db.raw. >>>> >>>> So, no reordering. >>> >>> I don't see it that way, the code requires that: >>> >>> txdata->tx_db.data.prod += nbd; >>> >>> is visible before the doorbell update.> >>> barrier() doesn't provide that. >>> >>> Neither does writel_relaxed(). However plain writel() does. >> >> Correct for some architectures including ARM but not correct universally. >> >> writel() just guarantees register read/writes before and after to be ordered >> when HW observes it. >> >> writel() doesn't guarantee that the memory update is visible to the HW on all >> architectures. >> >> If you need memory update visibility, that barrier() should have been a >> wmb() >> >> A correct multi-arch pattern is >> >> wmb() >> writel_relaxed() >> mmiowb() >> > > Sinan, Since you have mentioned the use of mmiowb() here after writel_relaxed(). > I believe this is not always correct for all types of IO mapped memory [Specially if IO memory is mapped using write combined (for ex. Ioremap_wc())]. > We have a current issue on our NIC (qede) driver on x86 for which the patch is already been sent more than a week ago [Still awaiting to hear from David on that]. > where mmiowb() seems to be useless since we use write combined mapped doorbell and mmiowb() just seems to be a compiler barrier() there. > So in order to flush write combined buffer we really need writel_relaxed() followed by a wmb() to synchronize writes among CPU cores. > I think the correct pattern in such cases (for write combined IO) would have been like below - > > wmb(); > writel_relaxed(); > wmb(); -> To flush the writes actually. You actually have good points. It is the same problem with barrier() description above. The answer really depends on what you are doing/expecting after mmiowb(). If you expect that some memory content to be observed by HW, you definitely need a wmb() like you mentioned. If you just want writes to be flushed but you don't expect any memory content to be updated, you need a mmiowb(). https://lwn.net/Articles/198988/ "There is mmiowb(), but its real purpose is to enforce ordering between MMIO operations only." > > Thanks. > > > > > -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html