On Wed, Apr 1, 2020 at 1:49 AM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <alastair@xxxxxxxxxxx> wrote: > > > > This driver exposes LPC memory on OpenCAPI pmem cards > > as an NVDIMM, allowing the existing nvram infrastructure > > to be used. > > > > Namespace metadata is stored on the media itself, so > > scm_reserve_metadata() maps 1 section's worth of PMEM storage > > at the start to hold this. The rest of the PMEM range is registered > > with libnvdimm as an nvdimm. ndctl_config_read/write/size() provide > > callbacks to libnvdimm to access the metadata. > > > > Signed-off-by: Alastair D'Silva <alastair@xxxxxxxxxxx> > > --- > > drivers/nvdimm/Kconfig | 2 + > > drivers/nvdimm/Makefile | 1 + > > drivers/nvdimm/ocxl/Kconfig | 15 ++ > > drivers/nvdimm/ocxl/Makefile | 7 + > > drivers/nvdimm/ocxl/main.c | 476 +++++++++++++++++++++++++++++++++ > > drivers/nvdimm/ocxl/ocxlpmem.h | 23 ++ > > 6 files changed, 524 insertions(+) > > create mode 100644 drivers/nvdimm/ocxl/Kconfig > > create mode 100644 drivers/nvdimm/ocxl/Makefile > > create mode 100644 drivers/nvdimm/ocxl/main.c > > create mode 100644 drivers/nvdimm/ocxl/ocxlpmem.h > > > > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig > > index b7d1eb38b27d..368328637182 100644 > > --- a/drivers/nvdimm/Kconfig > > +++ b/drivers/nvdimm/Kconfig > > @@ -131,4 +131,6 @@ config NVDIMM_TEST_BUILD > > core devm_memremap_pages() implementation and other > > infrastructure. > > > > +source "drivers/nvdimm/ocxl/Kconfig" > > + > > endif > > diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile > > index 29203f3d3069..bc02be11c794 100644 > > --- a/drivers/nvdimm/Makefile > > +++ b/drivers/nvdimm/Makefile > > @@ -33,3 +33,4 @@ libnvdimm-$(CONFIG_NVDIMM_KEYS) += security.o > > TOOLS := ../../tools > > TEST_SRC := $(TOOLS)/testing/nvdimm/test > > obj-$(CONFIG_NVDIMM_TEST_BUILD) += $(TEST_SRC)/iomap.o > > +obj-$(CONFIG_LIBNVDIMM) += ocxl/ > > diff --git a/drivers/nvdimm/ocxl/Kconfig b/drivers/nvdimm/ocxl/Kconfig > > new file mode 100644 > > index 000000000000..c5d927520920 > > --- /dev/null > > +++ b/drivers/nvdimm/ocxl/Kconfig > > @@ -0,0 +1,15 @@ > > +# SPDX-License-Identifier: GPL-2.0-only > > +if LIBNVDIMM > > + > > +config OCXL_PMEM > > + tristate "OpenCAPI Persistent Memory" > > + depends on LIBNVDIMM && PPC_POWERNV && PCI && EEH && ZONE_DEVICE && OCXL > > Does OXCL_PMEM itself have any CONFIG_ZONE_DEVICE dependencies? That's > more a function of CONFIG_DEV_DAX and CONFIG_FS_DAX. Doesn't OCXL > already depend on CONFIG_PCI? > > > > + help > > + Exposes devices that implement the OpenCAPI Storage Class Memory > > + specification as persistent memory regions. You may also want > > + DEV_DAX, DEV_DAX_PMEM & FS_DAX if you plan on using DAX devices > > + stacked on top of this driver. > > + > > + Select N if unsure. > > + > > +endif > > diff --git a/drivers/nvdimm/ocxl/Makefile b/drivers/nvdimm/ocxl/Makefile > > new file mode 100644 > > index 000000000000..e0e8ade1987a > > --- /dev/null > > +++ b/drivers/nvdimm/ocxl/Makefile > > @@ -0,0 +1,7 @@ > > +# SPDX-License-Identifier: GPL-2.0 > > + > > +ccflags-$(CONFIG_PPC_WERROR) += -Werror > > + > > +obj-$(CONFIG_OCXL_PMEM) += ocxlpmem.o > > + > > +ocxlpmem-y := main.o > > \ No newline at end of file > > diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c > > new file mode 100644 > > index 000000000000..c0066fedf9cc > > --- /dev/null > > +++ b/drivers/nvdimm/ocxl/main.c > > @@ -0,0 +1,476 @@ > > +// SPDX-License-Identifier: GPL-2.0+ > > +// Copyright 2020 IBM Corp. > > + > > +/* > > + * A driver for OpenCAPI devices that implement the Storage Class > > + * Memory specification. > > + */ > > + > > +#include <linux/module.h> > > +#include <misc/ocxl.h> > > +#include <linux/ndctl.h> > > +#include <linux/mm_types.h> > > +#include <linux/memory_hotplug.h> > > +#include "ocxlpmem.h" > > + > > +static const struct pci_device_id pci_tbl[] = { > > + { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), }, > > + { } > > +}; > > + > > +MODULE_DEVICE_TABLE(pci, pci_tbl); > > + > > +#define NUM_MINORS 256 // Total to reserve > > + > > +static dev_t ocxlpmem_dev; > > +static struct class *ocxlpmem_class; > > +static struct mutex minors_idr_lock; > > +static struct idr minors_idr; > > + > > +/** > > + * ndctl_config_write() - Handle a ND_CMD_SET_CONFIG_DATA command from ndctl > > + * @ocxlpmem: the device metadata > > + * @command: the incoming data to write > > + * Return: 0 on success, negative on failure > > + */ > > +static int ndctl_config_write(struct ocxlpmem *ocxlpmem, > > + struct nd_cmd_set_config_hdr *command) > > +{ > > + if (command->in_offset + command->in_length > LABEL_AREA_SIZE) > > + return -EINVAL; > > + > > + memcpy_flushcache(ocxlpmem->metadata_addr + command->in_offset, > > + command->in_buf, command->in_length); > > + > > + return 0; > > +} > > + > > +/** > > + * ndctl_config_read() - Handle a ND_CMD_GET_CONFIG_DATA command from ndctl > > + * @ocxlpmem: the device metadata > > + * @command: the read request > > + * Return: 0 on success, negative on failure > > + */ > > +static int ndctl_config_read(struct ocxlpmem *ocxlpmem, > > + struct nd_cmd_get_config_data_hdr *command) > > +{ > > + if (command->in_offset + command->in_length > LABEL_AREA_SIZE) > > + return -EINVAL; > > + > > + memcpy_mcsafe(command->out_buf, > > + ocxlpmem->metadata_addr + command->in_offset, > > + command->in_length); > > + > > + return 0; > > +} > > + > > +/** > > + * ndctl_config_size() - Handle a ND_CMD_GET_CONFIG_SIZE command from ndctl > > + * @command: the read request > > + * Return: 0 on success, negative on failure > > + */ > > +static int ndctl_config_size(struct nd_cmd_get_config_size *command) > > +{ > > + command->status = 0; > > + command->config_size = LABEL_AREA_SIZE; > > + command->max_xfer = PAGE_SIZE; > > + > > + return 0; > > +} > > + > > +static int ndctl(struct nvdimm_bus_descriptor *nd_desc, > > + struct nvdimm *nvdimm, > > + unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc) > > +{ > > + struct ocxlpmem *ocxlpmem = container_of(nd_desc, > > + struct ocxlpmem, bus_desc); > > + > > + switch (cmd) { > > + case ND_CMD_GET_CONFIG_SIZE: > > + *cmd_rc = ndctl_config_size(buf); > > + return 0; > > + > > + case ND_CMD_GET_CONFIG_DATA: > > + *cmd_rc = ndctl_config_read(ocxlpmem, buf); > > + return 0; > > + > > + case ND_CMD_SET_CONFIG_DATA: > > + *cmd_rc = ndctl_config_write(ocxlpmem, buf); > > + return 0; > > + > > + default: > > + return -ENOTTY; > > + } > > +} > > + > > +/** > > + * reserve_metadata() - Reserve space for nvdimm metadata > > + * @ocxlpmem: the device metadata > > + * @lpc_mem: The resource representing the LPC memory of the OpenCAPI device > > + */ > > +static int reserve_metadata(struct ocxlpmem *ocxlpmem, > > + struct resource *lpc_mem) > > +{ > > + ocxlpmem->metadata_addr = devm_memremap(&ocxlpmem->dev, lpc_mem->start, > > + LABEL_AREA_SIZE, MEMREMAP_WB); > > + if (IS_ERR(ocxlpmem->metadata_addr)) > > + return PTR_ERR(ocxlpmem->metadata_addr); > > + > > + return 0; > > +} > > + > > +/** > > + * register_lpc_mem() - Discover persistent memory on a device and register it with the NVDIMM subsystem > > + * @ocxlpmem: the device metadata > > + * Return: 0 on success > > + */ > > +static int register_lpc_mem(struct ocxlpmem *ocxlpmem) > > +{ > > + struct nd_region_desc region_desc; > > + struct nd_mapping_desc nd_mapping_desc; > > + struct resource *lpc_mem; > > + const struct ocxl_afu_config *config; > > + const struct ocxl_fn_config *fn_config; > > + int rc; > > + unsigned long nvdimm_cmd_mask = 0; > > + unsigned long nvdimm_flags = 0; > > + int target_node; > > + char serial[16 + 1]; > > + > > + // Set up the reserved metadata area > > + rc = ocxl_afu_map_lpc_mem(ocxlpmem->ocxl_afu); > > + if (rc < 0) > > + return rc; > > + > > + lpc_mem = ocxl_afu_lpc_mem(ocxlpmem->ocxl_afu); > > + if (!lpc_mem || !lpc_mem->start) > > + return -EINVAL; > > + > > + config = ocxl_afu_config(ocxlpmem->ocxl_afu); > > + fn_config = ocxl_function_config(ocxlpmem->ocxl_fn); > > + > > + rc = reserve_metadata(ocxlpmem, lpc_mem); > > + if (rc) > > + return rc; > > + > > + ocxlpmem->bus_desc.provider_name = "ocxlpmem"; > > + ocxlpmem->bus_desc.ndctl = ndctl; > > + ocxlpmem->bus_desc.module = THIS_MODULE; > > + > > + ocxlpmem->nvdimm_bus = nvdimm_bus_register(&ocxlpmem->dev, > > + &ocxlpmem->bus_desc); > > + if (!ocxlpmem->nvdimm_bus) > > + return -EINVAL; > > + > > + ocxlpmem->pmem_res.start = (u64)lpc_mem->start + LABEL_AREA_SIZE; > > + ocxlpmem->pmem_res.end = (u64)lpc_mem->start + config->lpc_mem_size - 1; > > + ocxlpmem->pmem_res.name = "OpenCAPI persistent memory"; > > + > > + set_bit(ND_CMD_GET_CONFIG_SIZE, &nvdimm_cmd_mask); > > + set_bit(ND_CMD_GET_CONFIG_DATA, &nvdimm_cmd_mask); > > + set_bit(ND_CMD_SET_CONFIG_DATA, &nvdimm_cmd_mask); > > + > > + set_bit(NDD_ALIASING, &nvdimm_flags); > > + > > + snprintf(serial, sizeof(serial), "%llx", fn_config->serial); > > + nd_mapping_desc.nvdimm = nvdimm_create(ocxlpmem->nvdimm_bus, ocxlpmem, > > + NULL, nvdimm_flags, > > + nvdimm_cmd_mask, 0, NULL); > > + if (!nd_mapping_desc.nvdimm) > > + return -ENOMEM; > > + > > + if (nvdimm_bus_check_dimm_count(ocxlpmem->nvdimm_bus, 1)) > > + return -EINVAL; > > + > > + nd_mapping_desc.start = ocxlpmem->pmem_res.start; > > + nd_mapping_desc.size = resource_size(&ocxlpmem->pmem_res); > > + nd_mapping_desc.position = 0; > > + > > + ocxlpmem->nd_set.cookie1 = fn_config->serial; > > + ocxlpmem->nd_set.cookie2 = fn_config->serial; > > + > > + target_node = of_node_to_nid(ocxlpmem->pdev->dev.of_node); > > + > > + memset(®ion_desc, 0, sizeof(region_desc)); > > + region_desc.res = &ocxlpmem->pmem_res; > > + region_desc.numa_node = NUMA_NO_NODE; > > + region_desc.target_node = target_node; > > + region_desc.num_mappings = 1; > > + region_desc.mapping = &nd_mapping_desc; > > + region_desc.nd_set = &ocxlpmem->nd_set; > > + > > + set_bit(ND_REGION_PAGEMAP, ®ion_desc.flags); > > + /* > > + * NB: libnvdimm copies the data from ndr_desc into it's own > > + * structures so passing a stack pointer is fine. > > + */ > > + ocxlpmem->nd_region = nvdimm_pmem_region_create(ocxlpmem->nvdimm_bus, > > + ®ion_desc); > > + if (!ocxlpmem->nd_region) > > + return -EINVAL; > > + > > + dev_info(&ocxlpmem->dev, > > + "Onlining %lluMB of persistent memory\n", > > + nd_mapping_desc.size / SZ_1M); > > + > > + return 0; > > +} > > + > > +/** > > + * allocate_minor() - Allocate a minor number to use for an OpenCAPI pmem device > > + * @ocxlpmem: the device metadata > > + * Return: the allocated minor number > > + */ > > +static int allocate_minor(struct ocxlpmem *ocxlpmem) > > +{ > > + int minor; > > + > > + mutex_lock(&minors_idr_lock); > > + minor = idr_alloc(&minors_idr, ocxlpmem, 0, NUM_MINORS, GFP_KERNEL); > > + mutex_unlock(&minors_idr_lock); > > + return minor; > > +} > > + > > +static void free_minor(struct ocxlpmem *ocxlpmem) > > +{ > > + mutex_lock(&minors_idr_lock); > > + idr_remove(&minors_idr, MINOR(ocxlpmem->dev.devt)); > > + mutex_unlock(&minors_idr_lock); > > +} > > + > > +/** > > + * free_ocxlpmem() - Free all members of an ocxlpmem struct > > + * @ocxlpmem: the device struct to clear > > + */ > > +static void free_ocxlpmem(struct ocxlpmem *ocxlpmem) > > +{ > > + int rc; > > + > > + free_minor(ocxlpmem); > > + > > + if (ocxlpmem->ocxl_context) { > > + rc = ocxl_context_detach(ocxlpmem->ocxl_context); > > + if (rc == -EBUSY) > > + dev_warn(&ocxlpmem->dev, "Timeout detaching ocxl context\n"); > > + else > > + ocxl_context_free(ocxlpmem->ocxl_context); > > + } > > + > > + if (ocxlpmem->ocxl_afu) > > + ocxl_afu_put(ocxlpmem->ocxl_afu); > > + > > + if (ocxlpmem->ocxl_fn) > > + ocxl_function_close(ocxlpmem->ocxl_fn); > > + > > + pci_dev_put(ocxlpmem->pdev); > > + > > + kfree(ocxlpmem); > > +} > > + > > +/** > > + * free_ocxlpmem_dev() - Free an OpenCAPI persistent memory device > > + * @dev: The device struct > > + */ > > +static void free_ocxlpmem_dev(struct device *dev) > > +{ > > + struct ocxlpmem *ocxlpmem = container_of(dev, struct ocxlpmem, dev); > > + > > + free_ocxlpmem(ocxlpmem); > > +} > > + > > +/** > > + * ocxlpmem_register() - Register an OpenCAPI pmem device with the kernel > > + * @ocxlpmem: the device metadata > > + * Return: 0 on success, negative on failure > > + */ > > +static int ocxlpmem_register(struct ocxlpmem *ocxlpmem) > > +{ > > + int rc; > > + int minor = allocate_minor(ocxlpmem); > > + > > + if (minor < 0) > > + return minor; > > + > > + ocxlpmem->dev.release = free_ocxlpmem_dev; > > + rc = dev_set_name(&ocxlpmem->dev, "ocxlpmem%d", minor); > > + if (rc < 0) > > + return rc; > > + > > + ocxlpmem->dev.devt = MKDEV(MAJOR(ocxlpmem_dev), minor); > > + ocxlpmem->dev.class = ocxlpmem_class; > > + ocxlpmem->dev.parent = &ocxlpmem->pdev->dev; > > + > > + return device_register(&ocxlpmem->dev); > > You lost me, why does this need it's own ioctl path and device node? > I'd expect you you'd tunnel the commands you need through the existing > infrastructure ideally with an eye for cross-arch compatibility. This > is a fundamental concern that probably has significant implications > for what follows. At a minimum this device registration should move to the end of the patchset once all the libnvdimm generic enabling is done. Even then I think the libnvdimm core has all the hooks you need to provide device specific enabling *through* the existing interfaces. Have a look at the how the ACPI NFIT bus driver provides "nfit/" scoped sysfs attributes and tunnels "struct nd_cmd_pkg" ioctl invocations to either the bus or "dimm" level device objects. The dimm security model is an example of an implementation that wraps a libnvdimm-generic sysfs front-end to an intel-specific-device backend. It would be unfortunate to need to invent another ABI.