On Tue, Mar 27, 2018 at 8:54 PM, Alexander Duyck <alexander.duyck@xxxxxxxxx> wrote: > On Tue, Mar 27, 2018 at 8:10 AM, Will Deacon <will.deacon@xxxxxxx> wrote: >>> >>> Sinan >>> "We are being told that if you use writel(), then you don't need a wmb() on >>> all architectures." >>> >>> Alex: >>> "I'm not sure who told you that but that is incorrect, at least for >>> x86. If you attempt to use writel() without the wmb() we will have to >>> NAK the patches. We will accept the wmb() with writel_releaxed() since >>> that solves things for ARM." >>> >>> > Jason is seeking behavior clarification for write combined buffers. >>> >>> Alex: >>> "Don't bother. I can tell you right now that for x86 you have to have a >>> wmb() before the writel(). >> >> To clarify: are you saying that on x86 you need a wmb() prior to a writel >> if you want that writel to be ordered after prior writes to memory? Is this >> specific to WC memory or some other non-standard attribute? > > Note, I am not a CPU guy so this is just my interpretation. It is my > understanding that the wmb(), aka sfence, is needed on x86 to sort out > writes between Write-back(WB) system memory and Strong Uncacheable > (UC) MMIO accesses. > > I was hoping to be able to cite something in the software developers > manual (https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf), > but that tends to be pretty vague. I have re-read section 22.34 > (volume 3B) several times and I am still not clear on if it says we > need the sfence or not. It is a matter of figuring out what the impact > of store buffers and caching are for WB versus UC memory. Here is what I found regarding the store buffer in that document: 11.10 STORE BUFFER Intel 64 and IA-32 processors temporarily store each write (store) to memory in a store buffer. The store buffer improves processor performance by allowing the processor to continue executing instructions without having to wait until a write to memory and/or to a cache is complete. It also allows writes to be delayed for more efficient use of memory-access bus cycles. In general, the existence of the store buffer is transparent to software, even in systems that use multiple processors. The processor ensures that write operations are always carried out in program order. It also insures that the contents of the store buffer are always drained to memory in the following situations: • When an exception or interrupt is generated. • (P6 and more recent processor families only) When a serializing instruction is executed. • When an I/O instruction is executed. • When a LOCK operation is performed. • (P6 and more recent processor families only) When a BINIT operation is performed. • (Pentium III, and more recent processor families only) When using an SFENCE instruction to order stores. • (Pentium 4 and more recent processor families only) When using an MFENCE instruction to order stores. The discussion of write ordering in Section 8.2, “Memory Ordering,” gives a detailed description of the operation of the store buffer. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html