On Tue, 2012-02-21 at 18:37 +0800, Chen Jie wrote: > 在 2012年2月17日 下午5:27,Chen Jie <chenj@xxxxxxxxxx> 写道: > >> One good way to test gart is to go over GPU gart table and write a > >> dword using the GPU at end of each page something like 0xCAFEDEAD > >> or somevalue that is unlikely to be already set. And then go over > >> all the page and check that GPU write succeed. Abusing the scratch > >> register write back feature is the easiest way to try that. > > I'm planning to add a GART table check procedure when resume, which > > will go over GPU gart table: > > 1. read(backup) a dword at end of each GPU page > > 2. write a mark by GPU and check it > > 3. restore the original dword > Attachment validateGART.patch do the job: > * It current only works for mips64 platform. > * To use it, apply all_in_vram.patch first, which will allocate CP > ring, ih, ib in VRAM and hard code no_wb=1. > > The gart test routine will be invoked in r600_resume. We've tried it, > and find that when lockup happened the gart table was good before > userspace restarting. The related dmesg follows: > [ 1521.820312] [drm] r600_gart_table_validate(): Validate GART Table > at 9000000040040000, 32768 entries, Dummy > Page[0x000000000e004000-0x000000000e007fff] > [ 1522.019531] [drm] r600_gart_table_validate(): Sweep 32768 > entries(valid=8544, invalid=24224, total=32768). > ... > [ 1531.156250] PM: resume of devices complete after 9396.588 msecs > [ 1532.152343] Restarting tasks ... done. > [ 1544.468750] radeon 0000:01:05.0: GPU lockup CP stall for more than 10003msec > [ 1544.472656] ------------[ cut here ]------------ > [ 1544.480468] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:243 > radeon_fence_wait+0x25c/0x314() > [ 1544.488281] GPU lockup (waiting for 0x0002136B last fence id 0x0002136A) > ... > [ 1544.886718] radeon 0000:01:05.0: Wait for MC idle timedout ! > [ 1545.046875] radeon 0000:01:05.0: Wait for MC idle timedout ! > [ 1545.062500] radeon 0000:01:05.0: WB disabled > [ 1545.097656] [drm] ring test succeeded in 0 usecs > [ 1545.105468] [drm] ib test succeeded in 0 usecs > [ 1545.109375] [drm] Enabling audio support > [ 1545.113281] [drm] r600_gart_table_validate(): Validate GART Table > at 9000000040040000, 32768 entries, Dummy > Page[0x000000000e004000-0x000000000e007fff] > [ 1545.125000] [drm:r600_gart_table_validate] *ERROR* Iter=0: > unexpected value 0x745aaad1(expect 0xDEADBEEF) > entry=0x000000000e008067, orignal=0x745aaad1 > ... > /* System blocked here. */ > > Any idea? I know lockup are frustrating, my only idea is the memory controller is lockup because of some failing pci <-> system ram transaction. > > BTW, we find the following in r600_pcie_gart_enable() > (drivers/gpu/drm/radeon/r600.c): > WREG32(VM_CONTEXT0_PROTECTION_FAULT_DEFAULT_ADDR, > (u32)(rdev->dummy_page.addr >> 12)); > > On our platform, PAGE_SIZE is 16K, does it have any problem? No this should be handled properly. > Also in radeon_gart_unbind() and radeon_gart_restore(), the logic > should change to: > for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) { > radeon_gart_set_page(rdev, t, page_base); > - page_base += RADEON_GPU_PAGE_SIZE; > + if (page_base != rdev->dummy_page.addr) > + page_base += RADEON_GPU_PAGE_SIZE; > } > ??? No need to do so, dummy page will be 16K too, so it's fine. Cheers, Jerome _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel