Re: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.

Jerome Glisse <j.glisse@xxxxxxxxx> · Thu, 16 Feb 2012 11:32:13 -0500

On Thu, Feb 16, 2012 at 05:21:10PM +0800, Chen Jie wrote:
> Hi,
> 
> 在 2012年2月15日 下午11:53，Jerome Glisse <j.glisse@xxxxxxxxx> 写道：
> > To me it looks like the CP is trying to fetch memory but the
> > GPU memory controller fail to fullfill cp request. Did you
> > check the PCI configuration before & after (when things don't
> > work) My best guest is PCI bus mastering is no properly working
> > or the PCIE GPU gart table as wrong data.
> >
> > Maybe one need to drop bus master and reenable bus master to
> > work around some bug...
> Thanks for your suggestion. We've tried the 'drop and reenable master'
> trick, unfortunately doesn't work.
> The PCI configuration compare will be done later.
> 
> Some additional information:
> The "GPU Lockup" seems always occur after tasks be restarting -- We
> inserted more ring tests , non of them failed before restarting tasks.
> 
> BTW, I hacked GART  table to try to simulate the problem:
> 1. Changes the system memory address(bus address) of ring_obj to an
> arbitrary value, e.g. 0 or 128M.
> 2. Changes the system memory address of a BO in radeon_test to an
> arbitrary value, e.g. 0
> 
> Non of above leaded to a GPU Lockup:
> Point 1 rendered a black screen;
> Point 2 only the test itself failed
> 
> Any idea?
> 

Ok let's start from the begining, i convince it's related to GPU
memory controller failing to full fill some request that hit system
memory. So in another mail you wrote :

> BTW, I found radeon_gart_bind() will call pci_map_page(), it hooks
> to swiotlb_map_page on our platform, which seems allocates and returns
> dma_addr_t of a new page from pool if not meet dma_mask. Seems a bug, since
> the BO backed by one set of pages, but mapped to GART was another set of
> pages?

Is this still the case ? As this is obviously wrong, we fixed that
recently. What drm code are you using. rs780 dma mask is something
like 40bits iirc so you should never have issue on your system with
1G of memory right ?

If you have an iommu what happens on resume ? Are all page previously
mapped with pci map page still valid ?

One good way to test gart is to go over GPU gart table and write a
dword using the GPU at end of each page something like 0xCAFEDEAD
or somevalue that is unlikely to be already set. And then go over
all the page and check that GPU write succeed. Abusing the scratch
register write back feature is the easiest way to try that.

Cheers,
Jerome
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel