arm64: Getting continuous PCIe "CmpltTO" AER from network card in kdump kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

I am facing issue on Marvell's ARM64 Thunder X2 with kdump kernel.
Here network card is continuously giving following AER error
[  100.839168] igb 0000:09:00.1: AER: aer_status: 0x00004000,
aer_mask: 0x00000000
[  100.846463] igb 0000:09:00.1: AER:    [14] CmpltTO                (First)
[  100.861491] igb 0000:09:00.1: AER: aer_layer=Transaction Layer,
aer_agent=Requester ID
[  100.869400] igb 0000:09:00.1: AER: aer_uncor_severity: 0x00062011

This error is not 100% reproducible. It happens 1 out of 4 try.

This error goes away in following two scenarios
A) Set iommu in bypass mode via bootargs iommu.passthrough=1
B) Wait for ~100ms in arm_smmu_device_reset of  drivers/iommu/arm-smmu-v3.c
        if (reg & CR0_SMMUEN) {
                dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
                WARN_ON(is_kdump_kernel() && !disable_bypass);
                mdelay(100);  <-- Added delay
                arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
        }

>From A), it is clear that it is related to IOMMU
>From B), looks like during boot of kdump kernel, network card is still
active and it has sent some request over PCIe.
as GPBA_ABORT bit is set, no response/completion coming to PCIe
controller hence "CmpltTO" error.

Ideally before setting GPBA_ABORT bit, there should be some check for
active transaction. if it is not possible, a wait should be done to
assure that no more pending transaction left.

why any such delay has not been considered?

--pk

_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec



[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux