On Sat, Aug 1, 2015 at 9:49 AM, Borislav Petkov <bp@xxxxxxxxx> wrote: > > My simplistic mental picture while thinking of this is the IO range > where you send the commands to the device and you don't really want to > delay those but they should reach the device as they get issued. Well, even for command streams, people often do go for a write-combining approach, simply because it is *so* much more efficient on the bus to buffer and burst things. The interface is set up to not really "combine" things in the over-writing sense, but just in the "combine continuous writes into bigger buffers on the CPU, and then write it out as efficiently as possible" sense. Of course, the device (and the driver) has to be designed properly for that, and it makes sense only with certain kinds of models, but it can actually be much more efficient to make the device interface be something like "write 32-byte command packets to a circular write-combining buffer" than it is to do things other ways. Back in the days, that was one of the most efficient ways to try to fill up the PCI bandwidth. There are other approaches too, of course, with the modern variation tending to be "the device does all real accesses by reading over DMA, and the only time you use IO accesses is for setup and as a 'start your DMA transfers now' kind of interface". But write-combining MMIO used to be a very common model for high-performace IO not that long ago, because DMA didn't actually use to be all that efficient at all (nasty behavior with caches and snooping etc back before the memory controller was on-die and DMA accesses snooped caches directly). So the "DMA is efficient even for smaller things" thing is relatively recent. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>