Re: [PATCH v2 2/8] cxl/mem: Find device capabilities

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21-02-10 17:41:04, Jonathan Cameron wrote:
> On Tue, 9 Feb 2021 16:02:53 -0800
> Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
> 
> > Provide enough functionality to utilize the mailbox of a memory device.
> > The mailbox is used to interact with the firmware running on the memory
> > device. The flow is proven with one implemented command, "identify".
> > Because the class code has already told the driver this is a memory
> > device and the identify command is mandatory.
> > 
> > CXL devices contain an array of capabilities that describe the
> > interactions software can have with the device or firmware running on
> > the device. A CXL compliant device must implement the device status and
> > the mailbox capability. Additionally, a CXL compliant memory device must
> > implement the memory device capability. Each of the capabilities can
> > [will] provide an offset within the MMIO region for interacting with the
> > CXL device.
> > 
> > The capabilities tell the driver how to find and map the register space
> > for CXL Memory Devices. The registers are required to utilize the CXL
> > spec defined mailbox interface. The spec outlines two mailboxes, primary
> > and secondary. The secondary mailbox is earmarked for system firmware,
> > and not handled in this driver.
> > 
> > Primary mailboxes are capable of generating an interrupt when submitting
> > a background command. That implementation is saved for a later time.
> > 
> > Link: https://www.computeexpresslink.org/download-the-specification
> > Signed-off-by: Ben Widawsky <ben.widawsky@xxxxxxxxx>
> > Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> 
> A few more comments inline (proper review whereas my other reply was a
> bug chase).
> 
> Jonathan
> 
> > ---
> >  drivers/cxl/Kconfig           |  14 +
> >  drivers/cxl/cxl.h             |  93 +++++++
> >  drivers/cxl/mem.c             | 511 +++++++++++++++++++++++++++++++++-
> >  drivers/cxl/pci.h             |  13 +
> >  include/uapi/linux/pci_regs.h |   1 +
> >  5 files changed, 630 insertions(+), 2 deletions(-)
> >  create mode 100644 drivers/cxl/cxl.h
> > 
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index 9e80b311e928..c4ba3aa0a05d 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -32,4 +32,18 @@ config CXL_MEM
> >  	  Chapter 2.3 Type 3 CXL Device in the CXL 2.0 specification.
> >  
> >  	  If unsure say 'm'.
> > +
> > +config CXL_MEM_INSECURE_DEBUG
> > +	bool "CXL.mem debugging"
> 
> As mentioned below, this makes me a tiny bit uncomfortable.
> 
> > +	depends on CXL_MEM
> > +	help
> > +	  Enable debug of all CXL command payloads.
> > +
> > +	  Some CXL devices and controllers support encryption and other
> > +	  security features. The payloads for the commands that enable
> > +	  those features may contain sensitive clear-text security
> > +	  material. Disable debug of those command payloads by default.
> > +	  If you are a kernel developer actively working on CXL
> > +	  security enabling say Y, otherwise say N.
> > +
> >  endif
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > new file mode 100644
> > index 000000000000..745f5e0bfce3
> > --- /dev/null
> > +++ b/drivers/cxl/cxl.h
> > @@ -0,0 +1,93 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright(c) 2020 Intel Corporation. */
> > +
> > +#ifndef __CXL_H__
> > +#define __CXL_H__
> > +
> > +#include <linux/bitfield.h>
> > +#include <linux/bitops.h>
> > +#include <linux/io.h>
> > +
> > +/* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
> > +#define CXLDEV_CAP_ARRAY_OFFSET 0x0
> > +#define   CXLDEV_CAP_ARRAY_CAP_ID 0
> > +#define   CXLDEV_CAP_ARRAY_ID_MASK GENMASK(15, 0)
> > +#define   CXLDEV_CAP_ARRAY_COUNT_MASK GENMASK(47, 32)
> > +/* CXL 2.0 8.2.8.2.1 CXL Device Capabilities */
> > +#define CXLDEV_CAP_CAP_ID_DEVICE_STATUS 0x1
> > +#define CXLDEV_CAP_CAP_ID_PRIMARY_MAILBOX 0x2
> > +#define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
> > +#define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
> > +
> > +/* CXL 2.0 8.2.8.4 Mailbox Registers */
> > +#define CXLDEV_MBOX_CAPS_OFFSET 0x00
> > +#define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> > +#define CXLDEV_MBOX_CTRL_OFFSET 0x04
> > +#define   CXLDEV_MBOX_CTRL_DOORBELL BIT(0)
> > +#define CXLDEV_MBOX_CMD_OFFSET 0x08
> > +#define   CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK GENMASK(15, 0)
> > +#define   CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK GENMASK(36, 16)
> > +#define CXLDEV_MBOX_STATUS_OFFSET 0x10
> > +#define   CXLDEV_MBOX_STATUS_RET_CODE_MASK GENMASK(47, 32)
> > +#define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18
> > +#define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
> > +
> > +/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> > +#define CXLMDEV_STATUS_OFFSET 0x0
> > +#define   CXLMDEV_DEV_FATAL BIT(0)
> > +#define   CXLMDEV_FW_HALT BIT(1)
> > +#define   CXLMDEV_STATUS_MEDIA_STATUS_MASK GENMASK(3, 2)
> > +#define     CXLMDEV_MS_NOT_READY 0
> > +#define     CXLMDEV_MS_READY 1
> > +#define     CXLMDEV_MS_ERROR 2
> > +#define     CXLMDEV_MS_DISABLED 3
> > +#define CXLMDEV_READY(status)                                                  \
> > +	(FIELD_GET(CXLMDEV_STATUS_MEDIA_STATUS_MASK, status) ==                \
> > +	 CXLMDEV_MS_READY)
> > +#define   CXLMDEV_MBOX_IF_READY BIT(4)
> > +#define   CXLMDEV_RESET_NEEDED_MASK GENMASK(7, 5)
> > +#define     CXLMDEV_RESET_NEEDED_NOT 0
> > +#define     CXLMDEV_RESET_NEEDED_COLD 1
> > +#define     CXLMDEV_RESET_NEEDED_WARM 2
> > +#define     CXLMDEV_RESET_NEEDED_HOT 3
> > +#define     CXLMDEV_RESET_NEEDED_CXL 4
> > +#define CXLMDEV_RESET_NEEDED(status)                                           \
> > +	(FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) !=                       \
> > +	 CXLMDEV_RESET_NEEDED_NOT)
> > +
> > +/**
> > + * struct cxl_mem - A CXL memory device
> > + * @pdev: The PCI device associated with this CXL device.
> > + * @regs: IO mappings to the device's MMIO
> > + * @status_regs: CXL 2.0 8.2.8.3 Device Status Registers
> > + * @mbox_regs: CXL 2.0 8.2.8.4 Mailbox Registers
> > + * @memdev_regs: CXL 2.0 8.2.8.5 Memory Device Registers
> > + * @payload_size: Size of space for payload
> > + *                (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
> > + * @mbox_mutex: Mutex to synchronize mailbox access.
> > + * @firmware_version: Firmware version for the memory device.
> > + * @pmem: Persistent memory capacity information.
> > + * @ram: Volatile memory capacity information.
> > + */
> > +struct cxl_mem {
> > +	struct pci_dev *pdev;
> > +	void __iomem *regs;
> > +
> > +	void __iomem *status_regs;
> > +	void __iomem *mbox_regs;
> > +	void __iomem *memdev_regs;
> > +
> > +	size_t payload_size;
> > +	struct mutex mbox_mutex; /* Protects device mailbox and firmware */
> > +	char firmware_version[0x10];
> > +
> > +	struct {
> > +		struct range range;
> > +	} pmem;
> 
> Christoph raised this in v1, and I agree with him that his would me more compact
> and readable as
> 
> 	struct range pmem_range;
> 	struct range ram_range;
> 
> The discussion seemed to get lost without getting resolved that I can see.
> 

I had been waiting for Dan to chime in, since he authored it. I'll change it and
he can yell if he cares.

> > +
> > +	struct {
> > +		struct range range;
> > +	} ram;
> 
> > +};
> > +
> > +#endif /* __CXL_H__ */
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > index 99a6571508df..0a868a15badc 100644
> > --- a/drivers/cxl/mem.c
> > +++ b/drivers/cxl/mem.c
> 
> 
> ...
> 
> > +static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm,
> > +				 struct mbox_cmd *mbox_cmd)
> > +{
> > +	struct device *dev = &cxlm->pdev->dev;
> > +
> > +	dev_dbg(dev, "Mailbox command (opcode: %#x size: %zub) timed out\n",
> > +		mbox_cmd->opcode, mbox_cmd->size_in);
> > +
> > +	if (IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {
> 
> Hmm.  Whilst I can see the advantage of this for debug, I'm not sure we want
> it upstream even under a rather evil looking CONFIG variable.
> 
> Is there a bigger lock we can use to avoid chance of accidental enablement?

Any suggestions? I'm told this functionality was extremely valuable for NVDIMM,
though I haven't personally experienced it.

> 
> 
> > +		print_hex_dump_debug("Payload ", DUMP_PREFIX_OFFSET, 16, 1,
> > +				     mbox_cmd->payload_in, mbox_cmd->size_in,
> > +				     true);
> > +	}
> > +}
> > +
> > +/**
> > + * cxl_mem_mbox_send_cmd() - Send a mailbox command to a memory device.
> > + * @cxlm: The CXL memory device to communicate with.
> > + * @mbox_cmd: Command to send to the memory device.
> > + *
> > + * Context: Any context. Expects mbox_lock to be held.
> > + * Return: -ETIMEDOUT if timeout occurred waiting for completion. 0 on success.
> > + *         Caller should check the return code in @mbox_cmd to make sure it
> > + *         succeeded.
> > + *
> > + * This is a generic form of the CXL mailbox send command, thus the only I/O
> > + * operations used are cxl_read_mbox_reg(). Memory devices, and perhaps other
> > + * types of CXL devices may have further information available upon error
> > + * conditions.
> > + *
> > + * The CXL spec allows for up to two mailboxes. The intention is for the primary
> > + * mailbox to be OS controlled and the secondary mailbox to be used by system
> > + * firmware. This allows the OS and firmware to communicate with the device and
> > + * not need to coordinate with each other. The driver only uses the primary
> > + * mailbox.
> > + */
> > +static int cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm,
> > +				 struct mbox_cmd *mbox_cmd)
> > +{
> > +	void __iomem *payload = cxlm->mbox_regs + CXLDEV_MBOX_PAYLOAD_OFFSET;
> > +	u64 cmd_reg, status_reg;
> > +	size_t out_len;
> > +	int rc;
> > +
> > +	lockdep_assert_held(&cxlm->mbox_mutex);
> > +
> > +	/*
> > +	 * Here are the steps from 8.2.8.4 of the CXL 2.0 spec.
> > +	 *   1. Caller reads MB Control Register to verify doorbell is clear
> > +	 *   2. Caller writes Command Register
> > +	 *   3. Caller writes Command Payload Registers if input payload is non-empty
> > +	 *   4. Caller writes MB Control Register to set doorbell
> > +	 *   5. Caller either polls for doorbell to be clear or waits for interrupt if configured
> > +	 *   6. Caller reads MB Status Register to fetch Return code
> > +	 *   7. If command successful, Caller reads Command Register to get Payload Length
> > +	 *   8. If output payload is non-empty, host reads Command Payload Registers
> > +	 *
> > +	 * Hardware is free to do whatever it wants before the doorbell is rung,
> > +	 * and isn't allowed to change anything after it clears the doorbell. As
> > +	 * such, steps 2 and 3 can happen in any order, and steps 6, 7, 8 can
> > +	 * also happen in any order (though some orders might not make sense).
> > +	 */
> > +
> > +	/* #1 */
> > +	if (cxl_doorbell_busy(cxlm)) {
> > +		dev_err_ratelimited(&cxlm->pdev->dev,
> > +				    "Mailbox re-busy after acquiring\n");
> > +		return -EBUSY;
> > +	}
> > +
> > +	cmd_reg = FIELD_PREP(CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK,
> > +			     mbox_cmd->opcode);
> > +	if (mbox_cmd->size_in) {
> > +		if (WARN_ON(!mbox_cmd->payload_in))
> > +			return -EINVAL;
> > +
> > +		cmd_reg |= FIELD_PREP(CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK,
> > +				      mbox_cmd->size_in);
> > +		memcpy_toio(payload, mbox_cmd->payload_in, mbox_cmd->size_in);
> > +	}
> > +
> > +	/* #2, #3 */
> > +	writeq(cmd_reg, cxlm->mbox_regs + CXLDEV_MBOX_CMD_OFFSET);
> > +
> > +	/* #4 */
> > +	dev_dbg(&cxlm->pdev->dev, "Sending command\n");
> > +	writel(CXLDEV_MBOX_CTRL_DOORBELL,
> > +	       cxlm->mbox_regs + CXLDEV_MBOX_CTRL_OFFSET);
> > +
> > +	/* #5 */
> > +	rc = cxl_mem_wait_for_doorbell(cxlm);
> > +	if (rc == -ETIMEDOUT) {
> > +		cxl_mem_mbox_timeout(cxlm, mbox_cmd);
> > +		return rc;
> > +	}
> > +
> > +	/* #6 */
> > +	status_reg = readq(cxlm->mbox_regs + CXLDEV_MBOX_STATUS_OFFSET);
> > +	mbox_cmd->return_code =
> > +		FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
> > +
> > +	if (mbox_cmd->return_code != 0) {
> > +		dev_dbg(&cxlm->pdev->dev, "Mailbox operation had an error\n");
> > +		return 0;
> 
> See earlier diversion whilst I was chasing my bug (another branch of this
> thread)
> 
> > +	}
> > +
> > +	/* #7 */
> > +	cmd_reg = readq(cxlm->mbox_regs + CXLDEV_MBOX_CMD_OFFSET);
> > +	out_len = FIELD_GET(CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK, cmd_reg);
> > +
> > +	/* #8 */
> > +	if (out_len && mbox_cmd->payload_out)
> > +		memcpy_fromio(mbox_cmd->payload_out, payload, out_len);
> > +
> > +	mbox_cmd->size_out = out_len;
> > +
> > +	return 0;
> > +}
> > +
> 
> 
> ...
> 
> > +static struct cxl_mem *cxl_mem_create(struct pci_dev *pdev, u32 reg_lo,
> > +				      u32 reg_hi)
> > +{
> > +	struct device *dev = &pdev->dev;
> > +	struct cxl_mem *cxlm;
> > +	void __iomem *regs;
> > +	u64 offset;
> > +	u8 bar;
> > +	int rc;
> > +
> > +	cxlm = devm_kzalloc(&pdev->dev, sizeof(*cxlm), GFP_KERNEL);
> > +	if (!cxlm) {
> > +		dev_err(dev, "No memory available\n");
> > +		return NULL;
> > +	}
> > +
> > +	offset = ((u64)reg_hi << 32) | FIELD_GET(CXL_REGLOC_ADDR_MASK, reg_lo);
> > +	bar = FIELD_GET(CXL_REGLOC_BIR_MASK, reg_lo);
> > +
> > +	/* Basic sanity check that BAR is big enough */
> > +	if (pci_resource_len(pdev, bar) < offset) {
> > +		dev_err(dev, "BAR%d: %pr: too small (offset: %#llx)\n", bar,
> > +			&pdev->resource[bar], (unsigned long long)offset);
> > +		return NULL;
> > +	}
> > +
> > +	rc = pcim_iomap_regions(pdev, BIT(bar), pci_name(pdev));
> > +	if (rc != 0) {
> 
> if (rc) 
> 
> > +		dev_err(dev, "failed to map registers\n");
> > +		return NULL;
> > +	}
> > +	regs = pcim_iomap_table(pdev)[bar];
> > +
> > +	mutex_init(&cxlm->mbox_mutex);
> > +	cxlm->pdev = pdev;
> > +	cxlm->regs = regs + offset;
> > +
> > +	dev_dbg(dev, "Mapped CXL Memory Device resource\n");
> > +	return cxlm;
> > +}
> >  
> 
> ...
> 
> >  static int cxl_mem_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  {
> >  	struct device *dev = &pdev->dev;
> > -	int regloc;
> > +	struct cxl_mem *cxlm;
> > +	int rc, regloc, i;
> > +	u32 regloc_size;
> > +
> > +	rc = pcim_enable_device(pdev);
> > +	if (rc)
> > +		return rc;
> >  
> >  	regloc = cxl_mem_dvsec(pdev, PCI_DVSEC_ID_CXL_REGLOC_OFFSET);
> >  	if (!regloc) {
> > @@ -39,7 +509,44 @@ static int cxl_mem_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  		return -ENXIO;
> >  	}
> >  
> > -	return 0;
> > +	/* Get the size of the Register Locator DVSEC */
> > +	pci_read_config_dword(pdev, regloc + PCI_DVSEC_HEADER1, &regloc_size);
> > +	regloc_size = FIELD_GET(PCI_DVSEC_HEADER1_LENGTH_MASK, regloc_size);
> > +
> > +	regloc += PCI_DVSEC_ID_CXL_REGLOC_BLOCK1_OFFSET;
> > +
> > +	rc = -ENXIO;
> > +	for (i = regloc; i < regloc + regloc_size; i += 8) {
> > +		u32 reg_lo, reg_hi;
> > +		u8 reg_type;
> > +
> > +		/* "register low and high" contain other bits */
> 
> high doesn't contain any other bits so that's a tiny bit misleading.
> 
> > +		pci_read_config_dword(pdev, i, &reg_lo);
> > +		pci_read_config_dword(pdev, i + 4, &reg_hi);
> > +
> > +		reg_type = FIELD_GET(CXL_REGLOC_RBI_MASK, reg_lo);
> > +
> > +		if (reg_type == CXL_REGLOC_RBI_MEMDEV) {
> > +			rc = 0;
> 
> I sort of assumed this unusual structure was to allow for some future
> change, but checked end result and it still looks like this.
> So, drop the rc assignment here and...
> 

[snip]

> 
> return -ENODEV;
> 

[snip]

> 
> With above direct return, only get here if rc = -ENXIO.
> Could just as easily check if i >= regloc + regloc_size then it's
> obvious this is kind of canonical form of 'not found'.
> 
> 
> Alternative would be to treat the above as a 'find' loop then
> have the clxm = cxl_mem_create() outside of the loop.
> 

I don't recall honestly, but I think it was meant to help distinguish the
failure type.
ENXIO - No register locator found
ENODEV - Some BAR or other resource not found/mapped.

I think this distinction is shown through debug messages or lack of, so I'm fine
to just make it -ENODEV in any failure.

> 
> > +
> > +	rc = cxl_mem_setup_regs(cxlm);
> > +	if (rc)
> > +		return rc;
> > +
> > +	rc = cxl_mem_setup_mailbox(cxlm);
> > +	if (rc)
> > +		return rc;
> > +
> > +	return cxl_mem_identify(cxlm);
> >  }
> >  
> >  static const struct pci_device_id cxl_mem_pci_tbl[] = {
> > diff --git a/drivers/cxl/pci.h b/drivers/cxl/pci.h
> > index f135b9f7bb21..ffcbc13d7b5b 100644
> > --- a/drivers/cxl/pci.h
> > +++ b/drivers/cxl/pci.h
> > @@ -14,5 +14,18 @@
> >  #define PCI_DVSEC_ID_CXL		0x0
> >  
> >  #define PCI_DVSEC_ID_CXL_REGLOC_OFFSET		0x8
> > +#define PCI_DVSEC_ID_CXL_REGLOC_BLOCK1_OFFSET	0xC
> > +
> > +/* BAR Indicator Register (BIR) */
> > +#define CXL_REGLOC_BIR_MASK GENMASK(2, 0)
> > +
> > +/* Register Block Identifier (RBI) */
> > +#define CXL_REGLOC_RBI_MASK GENMASK(15, 8)
> > +#define CXL_REGLOC_RBI_EMPTY 0
> > +#define CXL_REGLOC_RBI_COMPONENT 1
> > +#define CXL_REGLOC_RBI_VIRT 2
> > +#define CXL_REGLOC_RBI_MEMDEV 3
> > +
> > +#define CXL_REGLOC_ADDR_MASK GENMASK(31, 16)
> 
> CXL_REGLOCL_ADDR_LOW_MASK perhaps for clarity?
> 
> >  
> >  #endif /* __CXL_PCI_H__ */
> > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> > index e709ae8235e7..6267ca9ae683 100644
> > --- a/include/uapi/linux/pci_regs.h
> > +++ b/include/uapi/linux/pci_regs.h
> > @@ -1080,6 +1080,7 @@
> >  
> >  /* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */
> >  #define PCI_DVSEC_HEADER1		0x4 /* Designated Vendor-Specific Header1 */
> > +#define PCI_DVSEC_HEADER1_LENGTH_MASK	0xFFF00000
> 
> Seems sensible to add the revision mask as well.
> The vendor id currently read using a word read rather than dword, but perhaps
> neater to add that as well for completeness?
> 
> Having said that, given Bjorn's comment on clashes and the fact he'd rather see
> this stuff defined in drivers and combined later (see review patch 1 and follow
> the link) perhaps this series should not touch this header at all.

I'm fine to move it back.

>  
> >  #define PCI_DVSEC_HEADER2		0x8 /* Designated Vendor-Specific Header2 */
> >  
> >  /* Data Link Feature */
> 



[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux