Add the documentation files: - Documentation/PCI/endpoint/pci-nvme-function.rst - Documentation/PCI/endpoint/pci-nvme-howto.rst - Documentation/PCI/endpoint/function/binding/pci-nvme.rst To respectively document the NVMe PCI endpoint function driver internals, provide a user guide explaning how to setup an NVMe endpoint device and describe the NVMe endpoint function driver binding attributes. Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx> --- .../endpoint/function/binding/pci-nvme.rst | 34 ++++ Documentation/PCI/endpoint/index.rst | 3 + .../PCI/endpoint/pci-nvme-function.rst | 151 ++++++++++++++ Documentation/PCI/endpoint/pci-nvme-howto.rst | 189 ++++++++++++++++++ MAINTAINERS | 2 + 5 files changed, 379 insertions(+) create mode 100644 Documentation/PCI/endpoint/function/binding/pci-nvme.rst create mode 100644 Documentation/PCI/endpoint/pci-nvme-function.rst create mode 100644 Documentation/PCI/endpoint/pci-nvme-howto.rst diff --git a/Documentation/PCI/endpoint/function/binding/pci-nvme.rst b/Documentation/PCI/endpoint/function/binding/pci-nvme.rst new file mode 100644 index 000000000000..d80293c08bcd --- /dev/null +++ b/Documentation/PCI/endpoint/function/binding/pci-nvme.rst @@ -0,0 +1,34 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================== +PCI NVMe Endpoint Function +========================== + +1) Create the function subdirectory pci_epf_nvme.0 in the +pci_ep/functions/pci_epf_nvme directory of configfs. + +Standard EPF Configurable Fields: + +================ =========================================================== +vendorid Do not care (e.g. PCI_ANY_ID) +deviceid Do not care (e.g. PCI_ANY_ID) +revid Do not care +progif_code Must be 0x02 (NVM Express) +baseclass_code Must be 0x1 (PCI_BASE_CLASS_STORAGE) +subclass_code Must be 0x08 (Non-Volatile Memory controller) +cache_line_size Do not care +subsys_vendor_id Do not care (e.g. PCI_ANY_ID) +subsys_id Do not care (e.g. PCI_ANY_ID) +msi_interrupts At least equal to the number of queue pairs desired +msix_interrupts At least equal to the number of queue pairs desired +interrupt_pin Interrupt PIN to use if MSI and MSI-X are not supported +================ =========================================================== + +The NVMe EPF specific configurable fields are in the nvme subdirectory of the +directory created in 1 + +================ =========================================================== +ctrl_opts NVMe target connection parameters +dma_enable Enable (1) or disable (0) DMA transfers; default = 1 +mdts_kb Maximum data transfer size in KiB; default = 128 +================ =========================================================== diff --git a/Documentation/PCI/endpoint/index.rst b/Documentation/PCI/endpoint/index.rst index 4d2333e7ae06..764f1e8f81f2 100644 --- a/Documentation/PCI/endpoint/index.rst +++ b/Documentation/PCI/endpoint/index.rst @@ -15,6 +15,9 @@ PCI Endpoint Framework pci-ntb-howto pci-vntb-function pci-vntb-howto + pci-nvme-function + pci-nvme-howto function/binding/pci-test function/binding/pci-ntb + function/binding/pci-nvme diff --git a/Documentation/PCI/endpoint/pci-nvme-function.rst b/Documentation/PCI/endpoint/pci-nvme-function.rst new file mode 100644 index 000000000000..ac8baa5556be --- /dev/null +++ b/Documentation/PCI/endpoint/pci-nvme-function.rst @@ -0,0 +1,151 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================= +PCI NVMe Function +================= + +:Author: Damien Le Moal <dlemoal@xxxxxxxxxx> + +The PCI NVMe endpoint function driver implements a PCIe NVMe controller for a +local NVMe fabrics host controller. The fabrics controller target can use any +of the transports supported by the NVMe driver. In practice, using small SBC +boards equipped with a PCI endpoint controller, loop targets to files or block +devices or TCP targets to remote NVMe devices can be easily used. + +Overview +======== + +The NVMe endpoint function driver relies as most as possible on the NVMe +fabrics driver for executing NVMe commands received from the PCI RC host to +minimize NVMe command parsing. However, some admin commands must be modified to +satisfy PCI transport specifications constraints (e.g. queue management +commands support and the optional SGL support). + +Capabilities +------------ + +The NVMe capabilities exposed to the PCI RC host through the BAR 0 registers +are almost identical to the capabilities of the NVMe fabrics controller, with +some exceptions: + +1) NVMe-over-fabrics specifications mandate support for SGL. Howerver, this + capability is not exposed as supported because the current NVMe endpoint + driver code does not support SGL. + +2) The NVMe endpoint function driver can expose a different MDTS (Maximum Data + Transfer Size) than the fabrics controller used. + +Maximum Number of Queue Pairs +----------------------------- + +Upon binding of the NVMe endpoint function driver to the endpoint controller, +BAR 0 is allocated with enough space to accommodate up to +PCI_EPF_NVME_MAX_NR_QUEUES (16) queue pairs. This relatively low number is +necessary to avoid running out of memory windows for mapping PCI addresses to +local endpoint controller memory. + +The number of memory windows necessary for operation is roughly at most: +1) One memory window for raising MSI/MSI-X interrupts +2) One memory window for command PRP and data transfers +3) One memory window for each submission queue +4) One memory window for each completion queue + +Given the highly asynchronous nature of the NVMe endpoint function driver +operation, the memory windows needed as described above will generally not be +used simultaneously, but that may happen. So a safe maximum number of queue +pairs that can be supported is equal to the maximum number of memory windows of +the endpoint controller minus two and divided by two. E.g. for an endpoint PCI +controller with 32 outbound memory windows available, up to 10 queue pairs can +be safely operated without any risk of getting PCI space mapping errors due to +the lack of memory windows. + +The NVMe endpoint function driver allows configuring the maximum number of +queue pairs through configfs. + +Command Execution +================= + +The NVMe endpoint function driver relies on several work items to process NVMe +commands issued by the PCI RC host. + +Register Poll Work +------------------ + +The register poll work is a delayed work used to poll for changes to the +controller state register. This is used to detect operations initiated by the +PCI host such as enabling or enabling the NVMe controller. The register poll +work is scheduled every 5 ms. + +Submission Queue Work +--------------------- + +Upon creation of submission queues, starting with the submission queue for +admin commands, a delayed work is created and scheduled for execution every +jiffy to poll for a submission queue doorbell to detect submission of commands +by the PCI host. + +When changes to a submission queue work are detected by a submission queue +work, the work allocates a command structure to copy the NVMe command issued by +the PCI host and schedules processing of the command using the command work. + +Command Processing Work +----------------------- + +This per-NVMe command work is scheduled for execution when an NVMe command is +received from the host. This work will: + +1) Does minimal parsing of the NVMe command to determine if the command has a + data buffer. If it does, the PRP list for the command is retrieved to + identify the PCI address ranges used for the command data buffer. This can + lead to the command buffer being represented using several discontiguous + memory fragments. A local memory buffer is also allocated for local + execution of the command using the fabrics controller. + +2) If the command is a write command (DMA direction from host to device), data + is transferred from the host to the local memory buffer of the command. This + is handled in a loop to process all fragments of the command buffer as well + as simultaneously handle PCI address mapping constraints of the PCI endpoint + controller. + +3) The command is then executed using the NVMe driver fabrics code. This blocks + the command work until the command execution completes. + +4) When the command completes, the command work schedules handling of the + command response using the completion queue work. + +Completion Queue Work +--------------------- + +This per-completion queue work is used to aggregate handling of responses to +completed commands in batches to avoid having to issue an IRQ for every +completed command. This work is sceduled every time a command completes and +does: + +1) Post a command completion entry for all completed commands. + +2) Update the completion queue doorbell. + +3) Raise an IRQ to signal the host that commands have completed. + +Configuration +============= + +The NVMe endpoint function driver can be fully controlled using configfs, once +a NVMe fabrics target is also setup. The available configfs parameters are: + + ctrl_opts + + Fabrics controller connection arguments, as formatted for + the nvme cli "connect" command. + + dma_enable + + Enable (default) or disable DMA data transfers. + + mdts_kb + + Change the maximum data transfer size (default: 128 KB). + +See Documentation/PCI/endpoint/pci-nvme-howto.rst for a more detailed +description of these parameters and how to use them to configure an NVMe +endpoint function driver. diff --git a/Documentation/PCI/endpoint/pci-nvme-howto.rst b/Documentation/PCI/endpoint/pci-nvme-howto.rst new file mode 100644 index 000000000000..e15e8453b6d5 --- /dev/null +++ b/Documentation/PCI/endpoint/pci-nvme-howto.rst @@ -0,0 +1,189 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=========================================== +PCI NVMe Endpoint Function (EPF) User Guide +=========================================== + +:Author: Damien Le Moal <dlemoal@xxxxxxxxxx> + +This document is a guide to help users use the pci-epf-nvme function driver to +create PCIe NVMe controllers. For a high-level description of the NVMe function +driver internals, see Documentation/PCI/endpoint/pci-nvme-function.rst. + +Hardware and Kernel Requirements +================================ + +To use the NVMe PCI endpoint driver, at least one endpoint controller device +is required. + +To find the list of endpoint controller devices in the system:: + + # ls /sys/class/pci_epc/ + a40000000.pcie-ep + +If PCI_ENDPOINT_CONFIGFS is enabled:: + + # ls /sys/kernel/config/pci_ep/controllers + a40000000.pcie-ep + +Compiling the NVMe endpoint function driver depends on the target support of +the NVMe driver being enabled (CONFIG_NVME_TARGET). It is also recommended to +enable CONFIG_NVME_TARGET_LOOP to enable the use of loop targets (to use files +or block devices as storage for the NVMe target device). If the board used also +supports ethernet, CONFIG_NVME_TCP can be set to enable the use of remote TCP +NVMe targets. + +To facilitate testing, enabling the null-blk driver (CONFIG_BLK_DEV_NULL_BLK) +is also recommended. With this, a simple setup using a null_blk block device +with an NVMe loop target can be used. + + +NVMe Endpoint Device +==================== + +Creating an NVMe endpoint device is a two step process. First, an NVMe target +device must be defined. Second, the NVMe endpoint device must be setup using +the defined NVMe target device. + +Creating a NVMe Target Device +----------------------------- + +Details about how to configure and NVMe target are outside the scope of this +document. The following only provides a simple example of a loop target setup +using a null_blk device for storage. + +First, make sure that configfs is enabled:: + + # mount -t configfs none /sys/kernel/config + +Next, create a null_blk device (default settings give a 250 GB device without +memory backing). The block device created will be /dev/nullb0 by default:: + + # modprobe null_blk + # ls /dev/nullb0 + /dev/nullb0 + +The NVMe loop target driver must be loaded:: + + # modprobe nvme_loop + # lsmod | grep nvme + nvme_loop 16384 0 + nvmet 106496 1 nvme_loop + nvme_fabrics 28672 1 nvme_loop + nvme_core 131072 3 nvme_loop,nvmet,nvme_fabrics + +Now, create the NVMe loop target, starting with the NVMe subsystem, specifying +a maximum of 4 queue pairs:: + + # cd /sys/kernel/config/nvmet/subsystems + # mkdir pci_epf_nvme.0.nqn + # echo -n "Linux-pci-epf" > pci_epf_nvme.0.nqn/attr_model + # echo 4 > pci_epf_nvme.0.nqn/attr_qid_max + # echo 1 > pci_epf_nvme.0.nqn/attr_allow_any_host + +Next, create the target namespace using the null_blk block device:: + + # mkdir pci_epf_nvme.0.nqn/namespaces/1 + # echo -n "/dev/nullb0" > pci_epf_nvme.0.nqn/namespaces/1/device_path + # echo 1 > "pci_epf_nvme.0.nqn/namespaces/1/enable" + +Finally, create the target port and link it to the subsystem:: + + # cd /sys/kernel/config/nvmet/ports + # mkdir 1 + # echo -n "loop" > 1/addr_trtype + # ln -s /sys/kernel/config/nvmet/subsystems/pci_epf_nvme.0.nqn + 1/subsystems/pci_epf_nvme.0.nqn + + +Creating a NVMe Endpoint Device +------------------------------- + +With the NVMe target ready for use, the NVMe PCI endpoint device can now be +created and enabled. The first step is to load the NVMe function driver:: + + # modprobe pci_epf_nvme + # ls /sys/kernel/config/pci_ep/functions + pci_epf_nvme + +Next, create function 0:: + + # cd /sys/kernel/config/pci_ep/functions/pci_epf_nvme + # mkdir pci_epf_nvme.0 + # ls pci_epf_nvme.0/ + baseclass_code msix_interrupts secondary + cache_line_size nvme subclass_code + deviceid primary subsys_id + interrupt_pin progif_code subsys_vendor_id + msi_interrupts revid vendorid + +Configure the function using any vendor ID and device ID:: + + # cd /sys/kernel/config/pci_ep/functions/pci_epf_nvme/pci_epf_nvme.0 + # echo 0x15b7 > vendorid + # echo 0x5fff > deviceid + # echo 32 > msix_interrupts + # echo -n "transport=loop,nqn=pci_epf_nvme.0.nqn,nr_io_queues=4" > \ + ctrl_opts + +The ctrl_opts attribute must be set using equivalent arguments as used for a +norma NVMe target connection using "nvme connect" command. For the example +above, the equivalen target connection command is:: + + # nvme connect --transport=loop --nqn=pci_epf_nvme.0.nqn --nr-io-queues=4 + +The endpoint function can then be bound to the endpoint controller and the +controller started:: + + # cd /sys/kernel/config/pci_ep + # ln -s functions/pci_epf_nvme/pci_epf_nvme.0 controllers/a40000000.pcie-ep/ + # echo 1 > controllers/a40000000.pcie-ep/start + +Kernel messages will show information as the NVMe target device and endpoint +device are created and connected. + +.. code-block:: text + + pci_epf_nvme: Registered nvme EPF driver + nvmet: adding nsid 1 to subsystem pci_epf_nvme.0.nqn + pci_epf_nvme pci_epf_nvme.0: DMA RX channel dma3chan2, maximum segment size 4294967295 B + pci_epf_nvme pci_epf_nvme.0: DMA TX channel dma3chan0, maximum segment size 4294967295 B + pci_epf_nvme pci_epf_nvme.0: DMA supported + nvmet: creating nvm controller 1 for subsystem pci_epf_nvme.0.nqn for NQN nqn.2014-08.org.nvmexpress:uuid:0aa34ec6-11c0-4b02-ac9b-e07dff4b5c84. + nvme nvme0: creating 4 I/O queues. + nvme nvme0: new ctrl: "pci_epf_nvme.0.nqn" + pci_epf_nvme pci_epf_nvme.0: NVMe fabrics controller created, 4 I/O queues + pci_epf_nvme pci_epf_nvme.0: NVMe PCI controller supports MSI-X, 32 vectors + pci_epf_nvme pci_epf_nvme.0: NVMe PCI controller: 4 I/O queues + + +PCI RootComplex Host +==================== + +Booting the host, the NVMe endpoint device will be discoverable as a PCI device:: + + # lspci -n + 0000:01:00.0 0108: 15b7:5fff + +An this device will be recognized as an NVMe device with a single namespace:: + + # lsblk + NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS + nvme0n1 259:0 0 250G 0 disk + +The NVMe endpoint block device can then be used as any other regular NVMe +device. The nvme command line utility can be used to get more detailed +information about the endpoint device:: + + # nvme id-ctrl /dev/nvme0 + NVME Identify Controller: + vid : 0x15b7 + ssvid : 0x15b7 + sn : 0ec249554579a1d08fb5 + mn : Linux-pci-epf + fr : 6.12.0-r + rab : 6 + ieee : 000000 + cmic : 0 + mdts : 5 + ... diff --git a/MAINTAINERS b/MAINTAINERS index 301e0a1b56e8..48431d2aa751 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16563,6 +16563,8 @@ M: Damien Le Moal <dlemoal@xxxxxxxxxx> L: linux-pci@xxxxxxxxxxxxxxx L: linux-nvme@xxxxxxxxxxxxxxxxxxx S: Supported +F: Documentation/PCI/endpoint/function/binding/pci-nvme.rst +F: Documentation/PCI/endpoint/pci-nvme-*.rst F: drivers/pci/endpoint/functions/pci-epf-nvme.c NVM EXPRESS DRIVER -- 2.47.0