On Thu, Apr 14, 2022 at 01:32:30PM -0700, ira.weiny@xxxxxxxxx wrote: > +static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) [...] > + pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0); > + /* Read the second dword to get the length */ > + pci_read_config_dword(pdev, offset + PCI_DOE_READ, &val); > + pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0); > + > + length = FIELD_GET(PCI_DOE_DATA_OBJECT_HEADER_2_LENGTH, val); > + if (length > SZ_1M || length < 2) > + return -EIO; > + This API requires consumers to know the size of the response in advance. That's not always the case, as exemplified by SPDM VERSION responses. Jonathan uses a kludge in his SPDM patch which submits a first request with a fixed size of 2 versions and, if that turns out to be too small, submits a second request. It would be nice if consumers are allowed to set request_pl to NULL. Then a payload can be allocated here in pci_doe_recv_resp() with the size retrieved above. A flag may be necessary to indicate that the response is heap-allocated and needs to be freed upon destruction of the pci_doe_task. > + /* First 2 dwords have already been read */ > + length -= 2; > + /* Read the rest of the response payload */ > + for (i = 0; i < min(length, task->response_pl_sz / sizeof(u32)); i++) { > + pci_read_config_dword(pdev, offset + PCI_DOE_READ, > + &task->response_pl[i]); > + pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0); > + } You need to check the Data Object Ready bit. The device may clear the bit prematurely (e.g. as a result of a concurrent FLR or Conventional Reset). You'll continue reading zero dwords from the mailbox and pretend success to the caller even though the response is truncated. If you're concerned about performance when checking the bit on every loop iteration, checking it only on the last but one iteration should be sufficient to detect truncation. > + *vid = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID, response_pl); > + *protocol = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL, > + response_pl); > + *index = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX, > + response_pl); The fact that you need line breaks here is an indication that the macros are too long. > +/* DOE Data Object - note not actually registers */ > +#define PCI_DOE_DATA_OBJECT_HEADER_1_VID 0x0000ffff > +#define PCI_DOE_DATA_OBJECT_HEADER_1_TYPE 0x00ff0000 > +#define PCI_DOE_DATA_OBJECT_HEADER_2_LENGTH 0x0003ffff > + > +#define PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX 0x000000ff > +#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID 0x0000ffff > +#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL 0x00ff0000 > +#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX 0xff000000 I'd get rid of "DATA_OBJECT_" everywhere, it's redundant with the "Data Object" in "DOE". > +#define PCI_DOE_STATUS_INT_STATUS 0x00000002 /* DOE Interrupt Status */ Another redundancy, I would get rid of the second "_STATUS". > +#define PCI_DOE_STATUS_DATA_OBJECT_READY 0x80000000 /* Data Object Ready */ I would shorten to PCI_DOE_STATUS_READY. > Simplify submitting work to the mailbox > Replace pci_doe_exchange_sync() with > pci_doe_submit_task() Consumers of the mailbox > are now responsible for setting up callbacks > within a task object and submitting them to the > mailbox to be processed. I honestly think that's a mistake. All consumers both in the CDAT as well as in the SPDM series just want to wait for completion of the task. They have no need for an arbitrary callback and shouldn't be burdended with providing one. It just unnecessarily complicates the API. So only providing pci_doe_exchange_sync() and doing away with pci_doe_submit_task() would seem like a more appropriate approach. > +/** > + * pci_doe_for_each_off - Iterate each DOE capability > + * @pdev: struct pci_dev to iterate > + * @off: u16 of config space offset of each mailbox capability found > + */ > +#define pci_doe_for_each_off(pdev, off) \ > + for (off = pci_find_next_ext_capability(pdev, off, \ > + PCI_EXT_CAP_ID_DOE); \ > + off > 0; \ > + off = pci_find_next_ext_capability(pdev, off, \ > + PCI_EXT_CAP_ID_DOE)) > + > +struct pci_doe_mb *pci_doe_create_mb(struct pci_dev *pdev, u16 cap_offset, > + bool use_irq); > +void pci_doe_destroy_mb(struct pci_doe_mb *doe_mb); > +bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type); Drivers should not be concerned with the intricacies of DOE capabilities and mailboxes. Moreover, the above API doesn't allow different drivers to access the same DOE mailbox concurrently, e.g. if that mailbox supports multiple protocols. There's no locking to serialize access to the mailbox by the drivers. This should be moved to the PCI core instead: In pci_init_capabilities(), add a new call pci_doe_init() which enumerates all DOE capabilities. Add a list_head to struct pci_dev and add each DOE instance found to that list. Destroy the list elements in pci_destroy_dev(). No locking needed for the list_head, you only ever modify the list on device enumeration and destruction. Then provide a pci_doe_find_mailbox() library function which drivers call to retrieve the pci_doe_mb for a given pci_dev/vid/type tuple. That avoids the need to traverse the list time and again. > +/** > + * struct pci_doe_mb - State for a single DOE mailbox We generally use the same terms as the spec to make it easier for readers to connect the language in the spec to the implementation. The spec uniformly refers to "DOE instance". I guess "mailbox" is slightly more concise, so keep that, but please at least mention the term "instance" in the struct's kernel-doc. This implementation uses the term "task" for one request/response. That term is not mentioned in the spec at all. The spec refers to "exchange" and "transfer" on several occasions, so I would have chosen either one of those instead of the somewhat unfortunate "task". > + * This state is used to manage a single DOE mailbox capability. All fields > + * should be considered opaque to the consumers and the structure passed into > + * the helpers below after being created by devm_pci_doe_create() If the struct is considered opaque, why is it exposed in a public header file? Just use a forward declaration in the header so that consumers can pass around pointers to the struct, and hide the declaration proper in doe.c. > + * @pdev: PCI device this belongs to mailbox belongs to ^^^^^^^^^^ Typo. > + * @prots: Array of protocols supported on this DOE > + * @num_prots: Size of prots array Use @prots instead of prots everywhere in the kernel-doc. > + /* > + * NOTE: doe_mb_prots is freed by pci_doe_free_mb automatically on > + * error if pci_doe_cache_protocols() fails past this point. > + */ s/doe_mb_prots/doe_mb->prots/ s/pci_doe_free_mb/pci_doe_free_mb()/ > + /* DOE requests must be a whole number of DW */ > + if (task->request_pl_sz % sizeof(u32)) > + return -EINVAL; It would be nice if this restriction could be lifted. SPDM uses requests which are not padded to dword-length. It can run over other transports which may not impose such restrictions. The SPDM layer should not need to worry about quirks of the transport layer. > +static irqreturn_t pci_doe_irq_handler(int irq, void *data) > +{ > + struct pci_doe_mb *doe_mb = data; > + struct pci_dev *pdev = doe_mb->pdev; > + int offset = doe_mb->cap_offset; > + u32 val; > + > + pci_read_config_dword(pdev, offset + PCI_DOE_STATUS, &val); > + > + /* Leave the error case to be handled outside IRQ */ > + if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) { > + mod_delayed_work(system_wq, &doe_mb->statemachine, 0); > + return IRQ_HANDLED; > + } > + > + if (FIELD_GET(PCI_DOE_STATUS_INT_STATUS, val)) { > + pci_write_config_dword(pdev, offset + PCI_DOE_STATUS, > + PCI_DOE_STATUS_INT_STATUS); > + mod_delayed_work(system_wq, &doe_mb->statemachine, 0); > + return IRQ_HANDLED; > + } > + > + return IRQ_NONE; > +} PCIe 6.0, table 7-316 says that an interrupt is also raised when "the DOE Busy bit has been Cleared", yet such an interrupt is not handled here. It is incorrectly treated as a spurious interrupt by returning IRQ_NONE. The right thing to do is probably to wake the state machine in case it's polling for the Busy flag to clear. > +enum pci_doe_state { > + DOE_IDLE, > + DOE_WAIT_RESP, > + DOE_WAIT_ABORT, > + DOE_WAIT_ABORT_ON_ERR, > +}; > + > +#define PCI_DOE_FLAG_ABORT 0 > +#define PCI_DOE_FLAG_DEAD 1 That's all internal and should live in doe.c, not the header file. Thanks, Lukas