On Mon, 30 May 2022 21:06:57 +0200 Lukas Wunner <lukas@xxxxxxxxx> wrote: > On Thu, Apr 14, 2022 at 01:32:30PM -0700, ira.weiny@xxxxxxxxx wrote: > > +static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) > [...] > > + pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0); > > + /* Read the second dword to get the length */ > > + pci_read_config_dword(pdev, offset + PCI_DOE_READ, &val); > > + pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0); > > + > > + length = FIELD_GET(PCI_DOE_DATA_OBJECT_HEADER_2_LENGTH, val); > > + if (length > SZ_1M || length < 2) > > + return -EIO; > > + > > This API requires consumers to know the size of the response in advance. > That's not always the case, as exemplified by SPDM VERSION responses. > Jonathan uses a kludge in his SPDM patch which submits a first request > with a fixed size of 2 versions and, if that turns out to be too small, > submits a second request. > > It would be nice if consumers are allowed to set request_pl to NULL. > Then a payload can be allocated here in pci_doe_recv_resp() with the > size retrieved above. > > A flag may be necessary to indicate that the response is heap-allocated > and needs to be freed upon destruction of the pci_doe_task. If possible I'd make that the callers problem. It should know it provided NULL and expected a response. I'd suggest making this a 'future' feature just to keep this initial version simple. Won't be hard to add later. > > > > + /* First 2 dwords have already been read */ > > + length -= 2; > > + /* Read the rest of the response payload */ > > + for (i = 0; i < min(length, task->response_pl_sz / sizeof(u32)); i++) { > > + pci_read_config_dword(pdev, offset + PCI_DOE_READ, > > + &task->response_pl[i]); > > + pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0); > > + } > > You need to check the Data Object Ready bit. The device may clear the > bit prematurely (e.g. as a result of a concurrent FLR or Conventional > Reset). You'll continue reading zero dwords from the mailbox and > pretend success to the caller even though the response is truncated. > > If you're concerned about performance when checking the bit on every > loop iteration, checking it only on the last but one iteration should > be sufficient to detect truncation. Good catch - I hate corner cases. Thankfully this one is trivial to check for. > > > > + *vid = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID, response_pl); > > + *protocol = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL, > > + response_pl); > > + *index = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX, > > + response_pl); > > The fact that you need line breaks here is an indication that the > macros are too long. > > > > +/* DOE Data Object - note not actually registers */ > > +#define PCI_DOE_DATA_OBJECT_HEADER_1_VID 0x0000ffff > > +#define PCI_DOE_DATA_OBJECT_HEADER_1_TYPE 0x00ff0000 > > +#define PCI_DOE_DATA_OBJECT_HEADER_2_LENGTH 0x0003ffff > > + > > +#define PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX 0x000000ff > > +#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID 0x0000ffff > > +#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL 0x00ff0000 > > +#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX 0xff000000 > > I'd get rid of "DATA_OBJECT_" everywhere, it's redundant with the > "Data Object" in "DOE". We need something to make it clear these aren't DVSEC headers for example, but definitely could be shorter. PCI_DOE_OBJ_... perhaps? > > > > +#define PCI_DOE_STATUS_INT_STATUS 0x00000002 /* DOE Interrupt Status */ > > Another redundancy, I would get rid of the second "_STATUS". Hmm. I'll leave this one to Ira's discretion but I'm not keen on cropping too much of naming given loss of alignment with the spec. > > > > +#define PCI_DOE_STATUS_DATA_OBJECT_READY 0x80000000 /* Data Object Ready */ > > I would shorten to PCI_DOE_STATUS_READY. Seems reasonable though always some burden in moving away from full spec names. > > > > Simplify submitting work to the mailbox > > Replace pci_doe_exchange_sync() with > > pci_doe_submit_task() Consumers of the mailbox > > are now responsible for setting up callbacks > > within a task object and submitting them to the > > mailbox to be processed. > > I honestly think that's a mistake. All consumers both in the CDAT > as well as in the SPDM series just want to wait for completion of > the task. They have no need for an arbitrary callback and shouldn't > be burdended with providing one. It just unnecessarily complicates > the API. > > So only providing pci_doe_exchange_sync() and doing away with > pci_doe_submit_task() would seem like a more appropriate approach. We've gone around the houses with this. At this stage I don't care strongly either way and it's all in kernel code so we can refine it later once we have lots of examples. > > > > +/** > > + * pci_doe_for_each_off - Iterate each DOE capability > > + * @pdev: struct pci_dev to iterate > > + * @off: u16 of config space offset of each mailbox capability found > > + */ > > +#define pci_doe_for_each_off(pdev, off) \ > > + for (off = pci_find_next_ext_capability(pdev, off, \ > > + PCI_EXT_CAP_ID_DOE); \ > > + off > 0; \ > > + off = pci_find_next_ext_capability(pdev, off, \ > > + PCI_EXT_CAP_ID_DOE)) > > + > > +struct pci_doe_mb *pci_doe_create_mb(struct pci_dev *pdev, u16 cap_offset, > > + bool use_irq); > > +void pci_doe_destroy_mb(struct pci_doe_mb *doe_mb); > > +bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type); > > Drivers should not be concerned with the intricacies of DOE > capabilities and mailboxes. > > Moreover, the above API doesn't allow different drivers to access > the same DOE mailbox concurrently, e.g. if that mailbox supports > multiple protocols. There's no locking to serialize access to the > mailbox by the drivers. > > This should be moved to the PCI core instead: In pci_init_capabilities(), > add a new call pci_doe_init() which enumerates all DOE capabilities. > Add a list_head to struct pci_dev and add each DOE instance found > to that list. Destroy the list elements in pci_destroy_dev(). > No locking needed for the list_head, you only ever modify the list > on device enumeration and destruction. > > Then provide a pci_doe_find_mailbox() library function which drivers > call to retrieve the pci_doe_mb for a given pci_dev/vid/type tuple. > That avoids the need to traverse the list time and again. I've lost track a bit as this thread has been running a long time, but I think we are moving back to something that looks more like this (and the much earlier versions of the set which did more or less what you describe - though they had their own issues resolved in the meantime) There is an exciting corner around interrupts though that complicates things. Currently I think I'm right in saying the pci core never sets up interrupts, that is always a job for the drivers. My first thought is to do something similar to the hack I did for the switch ports, and use polled mode to query support protocols. Then later call from driver that will attempt to enable interrupts if desired for a given DOE instance. > > > > +/** > > + * struct pci_doe_mb - State for a single DOE mailbox > > We generally use the same terms as the spec to make it easier for > readers to connect the language in the spec to the implementation. > > The spec uniformly refers to "DOE instance". I guess "mailbox" > is slightly more concise, so keep that, but please at least mention > the term "instance" in the struct's kernel-doc. > > This implementation uses the term "task" for one request/response. > That term is not mentioned in the spec at all. The spec refers to > "exchange" and "transfer" on several occasions, so I would have chosen > either one of those instead of the somewhat unfortunate "task". > > > > + * This state is used to manage a single DOE mailbox capability. All fields > > + * should be considered opaque to the consumers and the structure passed into > > + * the helpers below after being created by devm_pci_doe_create() > > If the struct is considered opaque, why is it exposed in a public > header file? Just use a forward declaration in the header > so that consumers can pass around pointers to the struct, > and hide the declaration proper in doe.c. > > > > + * @pdev: PCI device this belongs to mailbox belongs to > ^^^^^^^^^^ > Typo. > > > + * @prots: Array of protocols supported on this DOE > > + * @num_prots: Size of prots array > > Use @prots instead of prots everywhere in the kernel-doc. > > > > + /* > > + * NOTE: doe_mb_prots is freed by pci_doe_free_mb automatically on > > + * error if pci_doe_cache_protocols() fails past this point. > > + */ > > s/doe_mb_prots/doe_mb->prots/ > s/pci_doe_free_mb/pci_doe_free_mb()/ > > > > + /* DOE requests must be a whole number of DW */ > > + if (task->request_pl_sz % sizeof(u32)) > > + return -EINVAL; > > It would be nice if this restriction could be lifted. SPDM uses > requests which are not padded to dword-length. It can run over other > transports which may not impose such restrictions. The SPDM layer > should not need to worry about quirks of the transport layer. This is a pain. DOE absolutely requires 32 bit padding and reads need to be 32 bit. We can obviously do something nasty with dealing with the tail via a bounce buffer though. I'd pencil that in as a future feature though rather than worry about it before we have any support at all in place. > > > > +static irqreturn_t pci_doe_irq_handler(int irq, void *data) > > +{ > > + struct pci_doe_mb *doe_mb = data; > > + struct pci_dev *pdev = doe_mb->pdev; > > + int offset = doe_mb->cap_offset; > > + u32 val; > > + > > + pci_read_config_dword(pdev, offset + PCI_DOE_STATUS, &val); > > + > > + /* Leave the error case to be handled outside IRQ */ > > + if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) { > > + mod_delayed_work(system_wq, &doe_mb->statemachine, 0); > > + return IRQ_HANDLED; > > + } > > + > > + if (FIELD_GET(PCI_DOE_STATUS_INT_STATUS, val)) { > > + pci_write_config_dword(pdev, offset + PCI_DOE_STATUS, > > + PCI_DOE_STATUS_INT_STATUS); > > + mod_delayed_work(system_wq, &doe_mb->statemachine, 0); > > + return IRQ_HANDLED; > > + } > > + > > + return IRQ_NONE; > > +} > > PCIe 6.0, table 7-316 says that an interrupt is also raised when > "the DOE Busy bit has been Cleared", yet such an interrupt is > not handled here. It is incorrectly treated as a spurious > interrupt by returning IRQ_NONE. The right thing to do > is probably to wake the state machine in case it's polling > for the Busy flag to clear. Ah. I remember testing this via a lot of hacking on the QEMU code to inject the various races that can occur (it was really ugly to do). Guess we lost the handling at some point. I think your fix is the right one. > > > > +enum pci_doe_state { > > + DOE_IDLE, > > + DOE_WAIT_RESP, > > + DOE_WAIT_ABORT, > > + DOE_WAIT_ABORT_ON_ERR, > > +}; > > + > > +#define PCI_DOE_FLAG_ABORT 0 > > +#define PCI_DOE_FLAG_DEAD 1 > > That's all internal and should live in doe.c, not the header file. > > Thanks, > > Lukas > Thanks for the review and thanks again to Ira for taking this forwards. Jonathan