Lukas Wunner wrote: > The CDAT exposed in sysfs differs between little endian and big endian > arches: On big endian, every 4 bytes are byte-swapped. > > PCI Configuration Space is little endian (PCI r3.0 sec 6.1). Accessors > such as pci_read_config_dword() implicitly swap bytes on big endian. > That way, the macros in include/uapi/linux/pci_regs.h work regardless of > the arch's endianness. For an example of implicit byte-swapping, see > ppc4xx_pciex_read_config(), which calls in_le32(), which uses lwbrx > (Load Word Byte-Reverse Indexed). > > DOE Read/Write Data Mailbox Registers are unlike other registers in > Configuration Space in that they contain or receive a 4 byte portion of > an opaque byte stream (a "Data Object" per PCIe r6.0 sec 7.9.24.5f). > They need to be copied to or from the request/response buffer verbatim. > So amend pci_doe_send_req() and pci_doe_recv_resp() to undo the implicit > byte-swapping. > > The CXL_DOE_TABLE_ACCESS_* and PCI_DOE_DATA_OBJECT_DISC_* macros assume > implicit byte-swapping. Byte-swap requests after constructing them with > those macros and byte-swap responses before parsing them. > > Change the request and response type to __le32 to avoid sparse warnings. > > Fixes: c97006046c79 ("cxl/port: Read CDAT table") > Tested-by: Ira Weiny <ira.weiny@xxxxxxxxx> > Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx # v6.0+ Good catch, a question below, but either way: Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> > --- > Changes v2 -> v3: > * Newly added patch in v3 > > drivers/cxl/core/pci.c | 12 ++++++------ > drivers/pci/doe.c | 13 ++++++++----- > include/linux/pci-doe.h | 8 ++++++-- > 3 files changed, 20 insertions(+), 13 deletions(-) > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 57764e9cd19d..d3cf1d9d67d4 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -480,7 +480,7 @@ static struct pci_doe_mb *find_cdat_doe(struct device *uport) > return NULL; > } > > -#define CDAT_DOE_REQ(entry_handle) \ > +#define CDAT_DOE_REQ(entry_handle) cpu_to_le32 \ > (FIELD_PREP(CXL_DOE_TABLE_ACCESS_REQ_CODE, \ > CXL_DOE_TABLE_ACCESS_REQ_CODE_READ) | \ > FIELD_PREP(CXL_DOE_TABLE_ACCESS_TABLE_TYPE, \ > @@ -493,8 +493,8 @@ static void cxl_doe_task_complete(struct pci_doe_task *task) > } > > struct cdat_doe_task { > - u32 request_pl; > - u32 response_pl[32]; > + __le32 request_pl; > + __le32 response_pl[32]; > struct completion c; > struct pci_doe_task task; > }; > @@ -531,7 +531,7 @@ static int cxl_cdat_get_length(struct device *dev, > if (t.task.rv < sizeof(u32)) > return -EIO; > > - *length = t.response_pl[1]; > + *length = le32_to_cpu(t.response_pl[1]); > dev_dbg(dev, "CDAT length %zu\n", *length); > > return 0; > @@ -548,7 +548,7 @@ static int cxl_cdat_read_table(struct device *dev, > do { > DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(entry_handle), t); > size_t entry_dw; > - u32 *entry; > + __le32 *entry; > int rc; > > rc = pci_doe_submit_task(cdat_doe, &t.task); > @@ -563,7 +563,7 @@ static int cxl_cdat_read_table(struct device *dev, > > /* Get the CXL table access header entry handle */ > entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, > - t.response_pl[0]); > + le32_to_cpu(t.response_pl[0])); > entry = t.response_pl + 1; > entry_dw = t.task.rv / sizeof(u32); > /* Skip Header */ > diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c > index 66d9ab288646..69efa9a250b9 100644 > --- a/drivers/pci/doe.c > +++ b/drivers/pci/doe.c > @@ -143,7 +143,7 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb, > length)); > for (i = 0; i < task->request_pl_sz / sizeof(u32); i++) > pci_write_config_dword(pdev, offset + PCI_DOE_WRITE, > - task->request_pl[i]); > + le32_to_cpu(task->request_pl[i])); What do you think about defining: int pci_doe_write(const struct pci_dev *dev, int where, __le32 val); int pci_doe_read(const struct pci_dev *dev, int where, __le32 *val); ...local to this file to make it extra clear that DOE transfers are passing raw byte-streams and not register values as pci_{write,read}_config_dword() expect.