Em Thu, 6 Dec 2018 13:36:24 -0500 Alex Deucher <alexdeucher@xxxxxxxxx> escreveu: > On Thu, Dec 6, 2018 at 1:05 PM Mauro Carvalho Chehab <mchehab@xxxxxxxxxx> wrote: > > > > Em Thu, 06 Dec 2018 18:18:23 +0100 > > Markus Dobel <markus.dobel@xxxxxx> escreveu: > > > > > Hi everyone, > > > > > > I will try if the hack mentioned fixes the issue for me on the weekend (but I assume, as if effectively removes the function). > > > > It should, but it keeps a few changes. Just want to be sure that what > > would be left won't cause issues. If this works, the logic that would > > solve Ryzen DMA fixes will be contained into a single point, making > > easier to maintain it. > > > > > > > > Just in case this is of interest, I neither have Ryzen nor Intel, but an HP Microserver G7 with an AMD Turion II Neo N54L, so the machine is more on the slow side. > > > > Good to know. It would probably worth to check if this Ryzen > > bug occors with all versions of it or with just a subset. > > I mean: maybe it is only at the first gen or Ryzen and doesn't > > affect Ryzen 2 (or vice versa). > > The original commit also mentions some Xeons are affected too. Seems > like this is potentially an issue on the device side rather than the > platform. Maybe. > > > > The PCI quirks logic will likely need to detect the PCI ID of > > the memory controllers found at the buggy CPUs, in order to enable > > the quirk only for the affected ones. > > > > It could be worth talking with AMD people in order to be sure about > > the differences at the DMA engine side. > > > > It's not clear to me what the pci or platform quirk would do. The > workaround seems to be in the driver, not on the platform. Yeah, the fix should be at the driver, but pci/quirk.c would be able to detect memory controllers that would require a hack inside the driver, in a similar way to what the media PCI drivers already do for DMA controllers that don't support pci2pci transfers. There, basically the PCI core (drivers/pci/pci.c and drivers/pci/quirks.c) sets a flag (pci_pci_problems) indicating potential issues. Then, the driver compares such flag in order to enable the specific quirk. Ok, there would be a different way to handle it. The driver could use a logic similar to the one I wrote for drivers/edac/i7core_edac.c. There, the logic seeks for some specific PCI device IDs using pci_get_device(), adjusting the code accordingly, depending on the detected PCI devices. In other words, running something like this (untested), at probe time should produce similar results: /* * FIXME: It probably makes sense to also be able to identify specific * versions of the same PCI ID, just in case a latter stepping got a * fix for the issue. */ const static struct { int vendor, dev; } broken_dev_id[] = { PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_foo, PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_bar, }, bool cx23885_does_dma_require_reset(void) { int i; struct pci_dev *pdev = NULL; for (i = 0; i < sizeof(broken_dev_id); i++) { pdev = pci_get_device(broken_dev_id[i].vendor, broken_dev_id[i].dev, NULL); if (pdev) { pci_put_device(pdev); return true; } } return false; } Should work. In any case, we need to know what memory controllers have problems, and what are their PCI IDs, and add them (if not there yet) at include/linux/pci_ids.h Thanks, Mauro