On Fri, Sep 11, 2020 at 10:39:16AM +1000, Benjamin Herrenschmidt wrote: > > > > > That's why I looped you in - that's what worries me about > > > > > "enabling" > > > > > arch_can_pci_mmap_wc() on arm64. If we enable it and we have perf > > > > > regressions that's not OK. > > > > > > > > > > Or we *can* enable arch_can_pci_mmap_wc() but force the mellanox > > > > > driver (or more broadly all drivers following this message push > > > > > semantics) to use "something else" for WC detection. > > > > > > > > arch_can_pci_mmap_wc() really only controls the sysfs resource file > > > > and it seems very unclear who in userspace uses that these days. > > > > > > dpdk under some circumstances afaik. > > > > And something gross for DMA then? Not sure dpdk is useful without > > DMA. Why not use CONFIG_VFIO_NOIOMMU for such a non-secure thing? > > Clint, can you elaborate on the use case ? > The use-case I'm targeting is the ENA pmd in DPDK. For performance reasons (many of which are very similar to what Jason has described for mlx5), we need to generate full-sized TLPs instead of many partial TLPs to improve efficiency. Here's an excerpt describing the write-combine usage from ./Documentation/networking/device_drivers/ethernet/amazon/ena.rst: - Low Latency Queue (LLQ) mode or "push-mode". * In this mode the driver pushes the transmit descriptors and the first 128 bytes of the packet directly to the ENA device memory space. The rest of the packet payload is fetched by the device. For this operation mode, the driver uses a dedicated PCI device memory BAR, which is mapped with write-combine capability. There's no DMA involved with this BAR-- the driver writes a portion of the packet contents in addition to the descriptors, which generally increases the number of TLPs if write-combine isn't used. Furthermore, this BAR is only used for writes and never for reads. As Jason noted in the other reply to this email, the Linux ENA driver makes use of WC by using devm_ioremap_wc(). The DPDK code here uses the same mechanism in user-space to enable write-combining by mapping the resourceX_wc file if the driver uses RTE_PCI_DRV_WC_ACTIVATE. Clint