[disclaimer: I've been involved with ZONE_DEVICE support and the pmem driver and wrote parts of the code and discussed a lot of the tradeoffs on how we handle I/O to memory in BARs] On Tue, Feb 16, 2016 at 08:13:58PM -0800, davide rossetti wrote: > 1) I see mm as appropriate for real memory, i.e. something that > user-space apps can pass around. mm is memory management, and this clearly falls under the umbrella, so it absolutely needs to be under mm/ and reviewed by the linux-mm crowd. > This is not totally true for BAR > memory, for instance: > a) as long as CPU initiated atomic ops are not supported on BAR space > of PCIe devices. > b) OTOT, CPU reading from BAR is awful (BW being abysmal,~10MB/s), > while high BW writing requires use of vector instructions (at least on > x86_64). > Bottom line is, BAR mappings are not like plain memory. That doesn't change how the are managed. We've always suppored mapping BARs to userspace in various drivers, and the only real news with things like the pmem driver with DAX or some of the things people want to do with the NVMe controller memoery buffer is that there are much bigger quantities of it, and: a) people want to be able have cachable mappings of various kinds instead of the old uncachable default. b) we want to be able to DMA (including RDMA) to the regions in the BARs. a) is something that needs smaller amounts in all kinds of areas to be done properly, but in principle GPU drivers have been doing this forever using all kinds of hacks. b) is the real issue. The Linux DMA support code doesn't really operate on just physical addresses, but on page structures, and we don't allocate for BARs. We investigated two ways to address this: 1) allow DMA operations without struct page and 2) create struct page structures for BARs that we want to be able to use DMA operations on. For various reasons version 2) was favored and this is how we ended up with ZONE_DEVICE. Read the linux-mm and linux-nvdimm lists for the lenghty discussions how we ended up here. Additional issues like which instructions to use for access build on top of these basic building blocks. > 2) Instead, I see appropriate that two sophisticated devices, like an > IB NIC and a storage/accelerator device, can freely target each other > for I/O, i.e. exchanging peer-to-peer PCIe transactions. And as long > as the existing sophisticated initiators are confined to the RDMA > subsystem, that is where this support belongs to. It doesn't. There is absolutely nothing RDMA specific here - please work with the overall community to do the right thing here. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>