On Sat, Mar 18, 2023 at 09:43:14PM -0500, Shanker Donthineni wrote: > The T241 platform suffers from the T241-FABRIC-4 erratum which causes > unexpected behavior in the GIC when multiple transactions are received > simultaneously from different sources. This hardware issue impacts > NVIDIA server platforms that use more than two T241 chips > interconnected. Each chip has support for 320 {E}SPIs. > > This issue occurs when multiple packets from different GICs are > incorrectly interleaved at the target chip. The erratum text below > specifies exactly what can cause multiple transfer packets susceptible > to interleaving and GIC state corruption. GIC state corruption can > lead to a range of problems, including kernel panics, and unexpected > behavior. > > From the erratum text: > "In some cases, inter-socket AXI4 Stream packets with multiple > transfers, may be interleaved by the fabric when presented to ARM > Generic Interrupt Controller. GIC expects all transfers of a packet > to be delivered without any interleaving. > > The following GICv3 commands may result in multiple transfer packets > over inter-socket AXI4 Stream interface: > - Register reads from GICD_I* and GICD_N* > - Register writes to 64-bit GICD registers other than GICD_IROUTERn* > - ITS command MOVALL > > Multiple commands in GICv4+ utilize multiple transfer packets, > including VMOVP, VMOVI, VMAPP, and 64-bit register accesses." > > This issue impacts system configurations with more than 2 sockets, > that require multi-transfer packets to be sent over inter-socket > AXI4 Stream interface between GIC instances on different sockets. > GICv4 cannot be supported. GICv3 SW model can only be supported > with the workaround. Single and Dual socket configurations are not > impacted by this issue and support GICv3 and GICv4." > > Link: https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf > > Writing to the chip alias region of the GICD_In{E} registers except > GICD_ICENABLERn has an equivalent effect as writing to the global > distributor. The SPI interrupt deactivate path is not impacted by > the erratum. > > To fix this problem, implement a workaround that ensures read accesses > to the GICD_In{E} registers are directed to the chip that owns the > SPI, and disable GICv4.x features. To simplify code changes, the > gic_configure_irq() function uses the same alias region for both read > and write operations to GICD_ICFGR. > > Co-developed-by: Vikram Sethi <vsethi@xxxxxxxxxx> > Signed-off-by: Vikram Sethi <vsethi@xxxxxxxxxx> > Signed-off-by: Shanker Donthineni <sdonthineni@xxxxxxxxxx> > --- > Changes since v4: > - Resolve Marc's comments https://lore.kernel.org/all/871qlqif9v.wl-maz@xxxxxxxxxx/ > Changes since v3: > - Fix the build issue for the 32bit arch > Changes since v2: > - Add accessors for the SOC-ID version & revision SMCCC/SOC ID part looks good to me. In case you spin another version for any reason, I would prefer you split those changes into separate patch. Otherwise Acked-by: Sudeep Holla <sudeep.holla@xxxxxxx> (for SMCCC/SOC ID bits) -- Regards, Sudeep