Re: [PATCH] Revert 95f408bb Ryzen DMA related RiSC engine stall fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Em Thu, 6 Dec 2018 13:36:24 -0500
Alex Deucher <alexdeucher@xxxxxxxxx> escreveu:

> On Thu, Dec 6, 2018 at 1:05 PM Mauro Carvalho Chehab <mchehab@xxxxxxxxxx> wrote:
> >
> > Em Thu, 06 Dec 2018 18:18:23 +0100
> > Markus Dobel <markus.dobel@xxxxxx> escreveu:
> >  
> > > Hi everyone,
> > >
> > > I will try if the hack mentioned fixes the issue for me on the weekend (but I assume, as if effectively removes the function).  
> >
> > It should, but it keeps a few changes. Just want to be sure that what
> > would be left won't cause issues. If this works, the logic that would
> > solve Ryzen DMA fixes will be contained into a single point, making
> > easier to maintain it.
> >  
> > >
> > > Just in case this is of interest, I neither have Ryzen nor Intel, but an HP Microserver G7 with an AMD Turion II Neo  N54L, so the machine is more on the slow side.  
> >
> > Good to know. It would probably worth to check if this Ryzen
> > bug occors with all versions of it or with just a subset.
> > I mean: maybe it is only at the first gen or Ryzen and doesn't
> > affect Ryzen 2 (or vice versa).  
> 
> The original commit also mentions some Xeons are affected too.  Seems
> like this is potentially an issue on the device side rather than the
> platform.

Maybe.

> >
> > The PCI quirks logic will likely need to detect the PCI ID of
> > the memory controllers found at the buggy CPUs, in order to enable
> > the quirk only for the affected ones.
> >
> > It could be worth talking with AMD people in order to be sure about
> > the differences at the DMA engine side.
> >  
> 
> It's not clear to me what the pci or platform quirk would do.  The
> workaround seems to be in the driver, not on the platform.

Yeah, the fix should be at the driver, but pci/quirk.c would be able
to detect memory controllers that would require a hack inside the
driver, in a similar way to what the media PCI drivers already do
for DMA controllers that don't support pci2pci transfers.

There, basically the PCI core (drivers/pci/pci.c and 
drivers/pci/quirks.c) sets a flag (pci_pci_problems) indicating
potential issues.

Then, the driver compares such flag in order to enable the specific quirk.

Ok, there would be a different way to handle it. The driver could use a 
logic similar to the one I wrote for drivers/edac/i7core_edac.c. There,
the logic seeks for some specific PCI device IDs using pci_get_device(),
adjusting the code accordingly, depending on the detected PCI devices.

In other words, running something like this (untested), at probe time should
produce similar results:

	/*
	 * FIXME: It probably makes sense to also be able to identify specific
	 * versions of the same PCI ID, just in case a latter stepping got a
	 * fix for the issue.
	 */
	const static struct {
		int vendor, dev;
	} broken_dev_id[] = {
		PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_foo,
		PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_bar,
	},

	bool cx23885_does_dma_require_reset(void) 
	{
		int i;
		struct pci_dev *pdev = NULL;

		for (i = 0; i < sizeof(broken_dev_id); i++) {
			pdev = pci_get_device(broken_dev_id[i].vendor, broken_dev_id[i].dev, NULL);
			if (pdev) {
				pci_put_device(pdev);
				return true;
			}
		}
		return false;
	}

Should work. In any case, we need to know what memory controllers 
have problems, and what are their PCI IDs, and add them (if not there yet)
at include/linux/pci_ids.h

Thanks,
Mauro



[Index of Archives]     [Linux Input]     [Video for Linux]     [Gstreamer Embedded]     [Mplayer Users]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux