Re: [PATCH] pci: tegra: Revert raw_violation_fixup for tegra124

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Please update subject to follow the convention ("git log --online
drivers/pci/controller/pci-tegra.c") to see it:

  PCI: tegra: Revert tegra124 raw_violation_fixup

On Fri, Jul 17, 2020 at 11:35:10PM +0200, Nicolas Chauvet wrote:
> As reported in https://bugzilla.kernel.org/206217 , raw_violation_fixup
> is causing more harm than good in some common use-cases.
> 
> This patch is a partial revert of the 191cd6fb5 commit:
>  "PCI: tegra: Add SW fixup for RAW violations"

Usual style is:
191cd6fb5d2c ("PCI: tegra: Add SW fixup for RAW violations")

> that was first introduced in 5.3-rc1 kernel.
> This fix the following regression since then.
> 
> * Description:
> When both the NIC and MMC are used one can see the following message:
> 
> NETDEV WATCHDOG: enp1s0 (r8169): transmit queue 0 timed out
> 
>   and
> 
> pcieport 0000:00:02.0: AER: Uncorrected (Non-Fatal) error received: 0000:01:00.0
> r8169 0000:01:00.0: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> r8169 0000:01:00.0: AER:   device [10ec:8168] error status/mask=00004000/00400000
> r8169 0000:01:00.0: AER:    [14] CmpltTO                (First)
> r8169 0000:01:00.0: AER: can't recover (no error_detected callback)
> pcieport 0000:00:02.0: AER: device recovery failed

Indent the quoted text (messages) two spaces so it's distinct from the
prose.

> After that, the ethernet NIC isn't functional anymore even after reloading
> the r8169 module.
> After a reboot, this is reproducible by copying a large file over the
> NIC to the MMC.

This looks like two paragraphs; if so, put a blank line between them.
Otherwise wrap them so they fill the line.  It's hard to read when
there are line breaks that look unnecessary.

> For some reasons this cannot be reproduced when the same file is copied
> to a tmpfs.
> 
> * Little background on the fixup, by Manikanta Maddireddy:
>   "In the internal testing with dGPU on Tegra124, CmplTO is reported by
> dGPU. This happened because FIFO queue in AFI(AXI to PCIe) module
> get full by upstream posted writes. Back to back upstream writes
> interleaved with infrequent reads, triggers RAW violation and CmpltTO.
> This is fixed by reducing the posted write credits and by changing
> updateFC timer frequency. These settings are fixed after stress test.
> 
> In the current case, RTL NIC is also reporting CmplTO. These settings
> seems to be aggravating the issue instead of fixing it."
> 
> v1: first non-RFC version
>  - Disable raw_violation_fixup and fully remove unused code and macros

This version history can go after the "---" so it doesn't get included
in the final commit log.  It's nice if your subject line includes
"[PATCH v2]" or whatever is appropriate.

Add this just before your Signed-off-by:

  Fixes: 191cd6fb5d2c ("PCI: tegra: Add SW fixup for RAW violations")

> Signed-off-by: Nicolas Chauvet <kwizart@xxxxxxxxx>
> Reviewed-by: Manikanta Maddireddy <mmaddireddy@xxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx> # 5.4.x

No "<>" needed around stable@xxxxxxxxxxxxxxx

You need not (and shouldn't) cc: stable@xxxxxxxxxxxxxxx when you post
this to the list.  The stable tag here in the commit log is
sufficient.  Documentation/process/stable-kernel-rules.rst for more
details.

Is v5.4.x really the oldest kernel that should get this fix?  It looks
like 191cd6fb5d2c appeared in v5.3.



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux