PCIe device issue since v6.1.16

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Recently, I discovered an issue with a PCIe device and recent kernels.

Background:

	Using a Linux host system, the PCIe device, a Torrent QN16e PCIe 16-128 Channel QAM Modulator, is passed through to a FreeBSD guest.

	Until recently, this was working as expected.

Issue:

	Starting with kernel v6.1.16, when the guest domain was started, the kernel would immediately report errors via vfio-pci.

	kernel: vfio-pci 0000:65:00.0: vfio_bar_restore: reset recovery - restoring BARs

	The guest would boot and its driver would load.  Then, when guest user-space, would access the device, a PCI system error (PERR) was raised,
	as reported by the impi event log, and the hardware itself would suffer a catastrophic event, cycling the server.

Discovery:

	After researching the issue, I found the commit that lead system error:

	https://lore.kernel.org/all/da77c92796b99ec568bd070cbe4725074a117038.1673769517.git.lukas@xxxxxxxxx/

	Specifically, this removal:

	- Drop an unnecessary 1 sec delay from pci_reset_secondary_bus() which
	is now performed by pci_bridge_wait_for_secondary_bus().  A static
	delay this long is only necessary for Conventional PCI, so modern
	PCIe systems benefit from shorter reset times as a side effect.

Resolution:

	I reintroduced the 1 second delay to pci_reset_secondary_bus, recompiled and installed and the system is now working as expected.

	void pci_reset_secondary_bus(struct pci_dev *dev)
	{
  	 u16 ctrl;

	   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl);
	   ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
	   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);

	   /*
	    * PCI spec v3.0 7.6.4.2 requires minimum Trst of 1ms.  Double
	    * this to 2ms to ensure that we meet the minimum requirement.
	    */
	   msleep(2);

	   ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
 	  pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);

	   ssleep(1);

	}

PCIe Device:

	https://www.videopropulsion.com/content/torrent-qn16e-pcie-16-128-channel-qam-modulator

$ lspci

0000:65:00.0 Multimedia controller: Genroco, Inc Device 0004 (rev 01)
	Subsystem: Genroco, Inc Device 0004
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 349
	NUMA node: 0
	IOMMU group: 1
	Region 0: Memory at e0e00000 (32-bit, non-prefetchable) [size=32K]
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [78] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [80] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75W
		DevCtl:	CorrErr- NonFatalErr- FatalErr+ UnsupReq-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <4us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
			Status:	NegoPending- InProgress-
	Capabilities: [200 v1] Vendor Specific Information: ID=1172 Rev=0 Len=044 <?>
	Kernel driver in use: vfio-pci

Regards,

Chad Schroeder




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux