On Tue, 8 May 2018 17:25:24 -0400 Don Dutile <ddutile@xxxxxxxxxx> wrote: > On 05/08/2018 12:57 PM, Alex Williamson wrote: > > On Mon, 7 May 2018 18:23:46 -0500 > > Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > >> On Mon, Apr 23, 2018 at 05:30:32PM -0600, Logan Gunthorpe wrote: > >>> Hi Everyone, > >>> > >>> Here's v4 of our series to introduce P2P based copy offload to NVMe > >>> fabrics. This version has been rebased onto v4.17-rc2. A git repo > >>> is here: > >>> > >>> https://github.com/sbates130272/linux-p2pmem pci-p2p-v4 > >>> ... > >> > >>> Logan Gunthorpe (14): > >>> PCI/P2PDMA: Support peer-to-peer memory > >>> PCI/P2PDMA: Add sysfs group to display p2pmem stats > >>> PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset > >>> PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches > >>> docs-rst: Add a new directory for PCI documentation > >>> PCI/P2PDMA: Add P2P DMA driver writer's documentation > >>> block: Introduce PCI P2P flags for request and request queue > >>> IB/core: Ensure we map P2P memory correctly in > >>> rdma_rw_ctx_[init|destroy]() > >>> nvme-pci: Use PCI p2pmem subsystem to manage the CMB > >>> nvme-pci: Add support for P2P memory in requests > >>> nvme-pci: Add a quirk for a pseudo CMB > >>> nvmet: Introduce helper functions to allocate and free request SGLs > >>> nvmet-rdma: Use new SGL alloc/free helper for requests > >>> nvmet: Optionally use PCI P2P memory > >>> > >>> Documentation/ABI/testing/sysfs-bus-pci | 25 + > >>> Documentation/PCI/index.rst | 14 + > >>> Documentation/driver-api/index.rst | 2 +- > >>> Documentation/driver-api/pci/index.rst | 20 + > >>> Documentation/driver-api/pci/p2pdma.rst | 166 ++++++ > >>> Documentation/driver-api/{ => pci}/pci.rst | 0 > >>> Documentation/index.rst | 3 +- > >>> block/blk-core.c | 3 + > >>> drivers/infiniband/core/rw.c | 13 +- > >>> drivers/nvme/host/core.c | 4 + > >>> drivers/nvme/host/nvme.h | 8 + > >>> drivers/nvme/host/pci.c | 118 +++-- > >>> drivers/nvme/target/configfs.c | 67 +++ > >>> drivers/nvme/target/core.c | 143 ++++- > >>> drivers/nvme/target/io-cmd.c | 3 + > >>> drivers/nvme/target/nvmet.h | 15 + > >>> drivers/nvme/target/rdma.c | 22 +- > >>> drivers/pci/Kconfig | 26 + > >>> drivers/pci/Makefile | 1 + > >>> drivers/pci/p2pdma.c | 814 +++++++++++++++++++++++++++++ > >>> drivers/pci/pci.c | 6 + > >>> include/linux/blk_types.h | 18 +- > >>> include/linux/blkdev.h | 3 + > >>> include/linux/memremap.h | 19 + > >>> include/linux/pci-p2pdma.h | 118 +++++ > >>> include/linux/pci.h | 4 + > >>> 26 files changed, 1579 insertions(+), 56 deletions(-) > >>> create mode 100644 Documentation/PCI/index.rst > >>> create mode 100644 Documentation/driver-api/pci/index.rst > >>> create mode 100644 Documentation/driver-api/pci/p2pdma.rst > >>> rename Documentation/driver-api/{ => pci}/pci.rst (100%) > >>> create mode 100644 drivers/pci/p2pdma.c > >>> create mode 100644 include/linux/pci-p2pdma.h > >> > >> How do you envison merging this? There's a big chunk in drivers/pci, but > >> really no opportunity for conflicts there, and there's significant stuff in > >> block and nvme that I don't really want to merge. > >> > >> If Alex is OK with the ACS situation, I can ack the PCI parts and you could > >> merge it elsewhere? > > > > AIUI from previously questioning this, the change is hidden behind a > > build-time config option and only custom kernels or distros optimized > > for this sort of support would enable that build option. I'm more than > > a little dubious though that we're not going to have a wave of distros > > enabling this only to get user complaints that they can no longer make > > effective use of their devices for assignment due to the resulting span > > of the IOMMU groups, nor is there any sort of compromise, configure > > the kernel for p2p or device assignment, not both. Is this really such > > a unique feature that distro users aren't going to be asking for both > > features? Thanks, > > > > Alex > At least 1/2 the cases presented to me by existing customers want it in a tunable kernel, > and tunable btwn two points, if the hw allows it to be 'contained' in that manner, which > a (layer of) switch(ing) provides. > To me, that means a kernel cmdline parameter to _enable_, and another sysfs (configfs? ... i'm not a configfs afficionato to say which is best), > method to make two points p2p dma capable. That's not what's done here AIUI. There are also some complications to making IOMMU groups dynamic, for instance could a downstream endpoint already be in use by a userspace tool as ACS is being twiddled in sysfs? Probably the easiest solution would be that all devices affected by the ACS change are soft unplugged before and re-added after the ACS change. Note that "affected" is not necessarily only the downstream devices if the downstream port at which we're playing with ACS is part of a multifunction device. Thanks, Alex