> -----Original Message----- > From: 'Joerg Roedel' [mailto:jroedel@xxxxxxx] > Sent: Tuesday, March 28, 2017 6:26 PM > To: Deucher, Alexander > Cc: 'Joerg Roedel'; Bjorn Helgaas; linux-pci@xxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx; Daniel Drake; Nath, Arindam > Subject: Re: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS > > On Tue, Mar 28, 2017 at 09:13:23PM +0000, Deucher, Alexander wrote: > > If I understand Arindam's patch correctly, it only flushes TLB entries > > for domains in the flush queue whereas the previous behavior was to > > flush all domains. If there was no TLB flush in the queue for that > > domain, could flushing it cause a problem? > > No, that can't cause a problem. An io/tlb flush for the device is just a > message that the device should invalidate its own tlb. The device can't > know and doesn't need to know whether the page-tables it used to fill > the tlb really changed. > > As it looks, the problem we are seeing here is that we are sending a > large amount of these requests to the GPU device, and wait for its > completion every time. This shouldn't be a problem for ATS devices, but > the GPU here seems to fail at some point and doesn't answer to the > invalidation request anymore, causing the completion-wait loop timeouts. > > Arindam's patch makes the high flush-frequency less likely, but it can > still happen, depending on how the GPU is used. So its the best to > keep ATS disabled on the device as it doesn't work correctly and we risk > running in the same problem again when we leave it enabled and just make > the trigger less likely. Thanks for clarifying. The patch is: Acked-by: Alex Deucher <alexander.deucher@xxxxxxx>