On Thu, 2015-07-02 at 20:49 +0200, Luis R. Rodriguez wrote: > > The question then is what is "the right thing". In the powerpc case, > > we'll have a non-garded mapping, which means we also get no ordering > > between load and stores. > > I don't follow, you *ordering* between load and stores for WC? We should > not need that for WC, its why WC is used for only very specific things > such as framebuffer and PIO (which BTw I still don't quite get all this > use case for infiniband to be honest, and I will note I do see some > proprietary hardware extensions like bursts but nothing covering all > this in a general doc, I think I think it all just has to do that this > is a hardware hack in reality, which we sell as a feature). Well, that's the problem, the semantics that we provide to drivers aren't well defined. The words "write combine" themselves only specify that writes to subsequent addresses can be combined into larger transactions. That's in itself is already quite vague (are their boundaries, limits ? some can depend on bus type, etc...) though in practice is probably sufficient. However, overloading a _wc mapping with additional memory model differences such as loss of ordering between load and stores, etc... is not an obvious thing to do. I agree it would make *my* life easier if we did it since this is precisely the semantics provided by a "G=0" mapping on ppc, but we need to agree and *document* it, otherwise bad things will happen eventually. We also need to document in that case which barriers can be used to explicitly enforce the ordering on such a mapping and which barriers can be used to break write combine (they don't necessarily have to be the same). We also need to document which accessors will actually provide the write combine "feature" of a _wc mapping. For example while writel() will do it on Intel, it will not on ppc and I wouldn't be surprised if a bunch of other archs fall in the same bucket as ppc (basically anything that has barriers in their writel implementation to order vs. DMA etc...). So we might need to explicitly document that writel_relaxed() needs to be used. Finally what are the precise guaranteed semantics of writeX/readX, writeX_relaxed/readX_relaxed and __raw_ (everything else) on a _wc mapping, what do we mandate and document, what do we leave to be implementation dependent ? Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html