Re: [PATCH v4] irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marc,

On 3/18/23 04:44, Marc Zyngier wrote:
External email: Use caution opening links or attachments


On Sat, 18 Mar 2023 04:58:12 +0000,
Shanker Donthineni <sdonthineni@xxxxxxxxxx> wrote:

The T241 platform suffers from the T241-FABRIC-4 erratum which causes
unexpected behavior in the GIC when multiple transactions are received
simultaneously from different sources. This hardware issue impacts
NVIDIA server platforms that use more than two T241 chips
interconnected. Each chip has support for 320 {E}SPIs.

This issue occurs when multiple packets from different GICs are
incorrectly interleaved at the target chip. The erratum text below
specifies exactly what can cause multiple transfer packets susceptible
to interleaving and GIC state corruption. GIC state corruption can
lead to a range of problems, including kernel panics, and unexpected
behavior.

 From the erratum text:
   "In some cases, inter-socket AXI4 Stream packets with multiple
   transfers, may be interleaved by the fabric when presented to ARM
   Generic Interrupt Controller. GIC expects all transfers of a packet
   to be delivered without any interleaving.

   The following GICv3 commands may result in multiple transfer packets
   over inter-socket AXI4 Stream interface:
    - Register reads from GICD_I* and GICD_N*
    - Register writes to 64-bit GICD registers other than GICD_IROUTERn*
    - ITS command MOVALL

   Multiple commands in GICv4+ utilize multiple transfer packets,
   including VMOVP, VMOVI, VMAPP, and 64-bit register accesses."

   This issue impacts system configurations with more than 2 sockets,
   that require multi-transfer packets to be sent over inter-socket
   AXI4 Stream interface between GIC instances on different sockets.
   GICv4 cannot be supported. GICv3 SW model can only be supported
   with the workaround. Single and Dual socket configurations are not
   impacted by this issue and support GICv3 and GICv4."

Link: https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf

Writing to the chip alias region of the GICD_In{E} registers except
GICD_ICENABLERn has an equivalent effect as writing to the global
distributor. The SPI interrupt deactivate path is not impacted by
the erratum.

To fix this problem, implement a workaround that ensures read accesses
to the GICD_In{E} registers are directed to the chip that owns the
SPI, and disables GICv4.x features for KVM. To simplify code changes,
the gic_configure_irq() function uses the same alias region for both
read and write operations to GICD_ICFGR.

Co-developed-by: Vikram Sethi <vsethi@xxxxxxxxxx>
Signed-off-by: Vikram Sethi <vsethi@xxxxxxxxxx>
Signed-off-by: Shanker Donthineni <sdonthineni@xxxxxxxxxx>
---
Changes since v2:
  - Fix the build issue for the 32bit arch
Changes since v2:
  - Add accessors for the SOC-ID version & revision
  - Include "linux/bitfield.h" and "linux/bits.h" in irq-gic-v3.c
Changes since v1:
  - Use SMCCC SOC-ID API for detecting the T241 chip
  - Implement Marc's suggestions
  - Edit commit text

You seem to have ignored most of my comments on v2[1] apart from the
SOC_ID stuff. I guess I'll wait for v5...

         M.

[1] https://lore.kernel.org/all/871qlqif9v.wl-maz@xxxxxxxxxx/


Sorry, I did not intentionally ignore your input, but unfortunately, lost
this specific email in my outlook. Your feedback is valuable, and we will
ensure that all of your review comments are addressed in the v5.

-Shanker




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux