Thank you for your reply. I found CP_RB_WPTR has changed when "ring test failed", so I think CP is active, but what it get from ring buffer is wrong. Then, I want to know whether there is a way to check the content that GPU get from ring buffer. BTW, when I use "echo shutdown > /sys/power/disk; echo disk > /sys/power/state" to do a hibernation, there will be occasionally "GPU reset" just like suspend. However, if I use "echo reboot > /sys/power/disk; echo disk > /sys/power/state" to do a hibernation and wakeup automatically, there is no "GPU reset" after hundreds of tests. What does this imply? Power loss cause something break? Best regards, Huacai Chen > 2011/12/7 <chenhc@xxxxxxxxxx>: >> When "MC timeout" happens at GPU reset, we found the 12th and 13th >> bits of R_000E50_SRBM_STATUS is 1. From kernel code we found these >> two bits are like this: >> #define G_000E50_MCDX_BUSY(x) (((x) >> 12) & 1) >> #define G_000E50_MCDW_BUSY(x) (((x) >> 13) & 1) >> >> Could you please tell me what does they mean? And if possible, > > They refer to sub-blocks in the memory controller. I don't really > know off hand what the name mean. > >> I want to know the functionalities of these 5 registers in detail: >> #define R_000E60_SRBM_SOFT_RESET 0x0E60 >> #define R_000E50_SRBM_STATUS 0x0E50 >> #define R_008020_GRBM_SOFT_RESET 0x8020 >> #define R_008010_GRBM_STATUS 0x8010 >> #define R_008014_GRBM_STATUS2 0x8014 >> >> A bit more info: If I reset the MC after resetting CP (this is what >> Linux-2.6.34 does, but removed since 2.6.35), then "MC timeout" will >> disappear, but there is still "ring test failed". > > The bits are defined in r600d.h. As to the acronyms: > BIF - Bus InterFace > CG - clocks > DC - Display Controller > GRBM - Graphics block (3D engine) > HDP - Host Data Path (CPU access to vram via the PCI BAR) > IH, RLC - Interrupt controller > MC - Memory controller > ROM - ROM > SEM - semaphore controller > > When you reset the MC, you will probably have to reset just about > everything else since most blocks depend on the MC for access to > memory. If you do reset the MC, you should do it at prior to calling > asic_init so you make sure all the hw gets re-initialized properly. > Additionally, you should probably reset the GRBM either via > SRBM_SOFT_RESET or the individual sub-blocks via GRBM_SOFT_RESET. > > Alex > >> >> Huacai Chen >> >>> 2011/11/8 <chenhc@xxxxxxxxxx>: >>>> And, I want to know something: >>>> 1, Does GPU use MC to access GTT? >>> >>> Yes. All GPU clients (display, 3D, etc.) go through the MC to access >>> memory (vram or gart). >>> >>>> 2, What can cause MC timeout? >>> >>> Lots of things. Some GPU client still active, some GPU client hung or >>> not properly initialized. >>> >>> Alex >>> >>>> >>>>> Hi, >>>>> >>>>> Some status update. >>>>> 在 2011年9月29日 下午5:17,Chen Jie <chenj@xxxxxxxxxx> 写道: >>>>>> Hi, >>>>>> Add more information. >>>>>> We got occasionally "GPU lockup" after resuming from suspend(on >>>>>> mipsel >>>>>> platform with a mips64 compatible CPU and rs780e, the kernel is >>>>>> 3.1.0-rc8 >>>>>> 64bit). Related kernel message: >>>>>> /* return from STR */ >>>>>> [ 156.152343] radeon 0000:01:05.0: WB enabled >>>>>> [ 156.187500] [drm] ring test succeeded in 0 usecs >>>>>> [ 156.187500] [drm] ib test succeeded in 0 usecs >>>>>> [ 156.398437] ata2: SATA link down (SStatus 0 SControl 300) >>>>>> [ 156.398437] ata3: SATA link down (SStatus 0 SControl 300) >>>>>> [ 156.398437] ata4: SATA link down (SStatus 0 SControl 300) >>>>>> [ 156.578125] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl >>>>>> 300) >>>>>> [ 156.597656] ata1.00: configured for UDMA/133 >>>>>> [ 156.613281] usb 1-5: reset high speed USB device number 4 using >>>>>> ehci_hcd >>>>>> [ 157.027343] usb 3-2: reset low speed USB device number 2 using >>>>>> ohci_hcd >>>>>> [ 157.609375] usb 3-3: reset low speed USB device number 3 using >>>>>> ohci_hcd >>>>>> [ 157.683593] r8169 0000:02:00.0: eth0: link up >>>>>> [ 165.621093] PM: resume of devices complete after 9679.556 msecs >>>>>> [ 165.628906] Restarting tasks ... done. >>>>>> [ 177.085937] radeon 0000:01:05.0: GPU lockup CP stall for more >>>>>> than >>>>>> 10019msec >>>>>> [ 177.089843] ------------[ cut here ]------------ >>>>>> [ 177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 >>>>>> radeon_fence_wait+0x25c/0x33c() >>>>>> [ 177.105468] GPU lockup (waiting for 0x000013C3 last fence id >>>>>> 0x000013AD) >>>>>> [ 177.113281] Modules linked in: psmouse serio_raw >>>>>> [ 177.117187] Call Trace: >>>>>> [ 177.121093] [<ffffffff806f3e7c>] dump_stack+0x8/0x34 >>>>>> [ 177.125000] [<ffffffff8022e4f4>] warn_slowpath_common+0x78/0xa0 >>>>>> [ 177.132812] [<ffffffff8022e5b8>] warn_slowpath_fmt+0x38/0x44 >>>>>> [ 177.136718] [<ffffffff80522ed8>] radeon_fence_wait+0x25c/0x33c >>>>>> [ 177.144531] [<ffffffff804e9e70>] ttm_bo_wait+0x108/0x220 >>>>>> [ 177.148437] [<ffffffff8053b478>] >>>>>> radeon_gem_wait_idle_ioctl+0x80/0x114 >>>>>> [ 177.156250] [<ffffffff804d2fe8>] drm_ioctl+0x2e4/0x3fc >>>>>> [ 177.160156] [<ffffffff805a1820>] >>>>>> radeon_kms_compat_ioctl+0x28/0x38 >>>>>> [ 177.167968] [<ffffffff80311a04>] compat_sys_ioctl+0x120/0x35c >>>>>> [ 177.171875] [<ffffffff80211d18>] handle_sys+0x118/0x138 >>>>>> [ 177.179687] ---[ end trace 92f63d998efe4c6d ]--- >>>>>> [ 177.187500] radeon 0000:01:05.0: GPU softreset >>>>>> [ 177.191406] radeon 0000:01:05.0: >>>>>> R_008010_GRBM_STATUS=0xF57C2030 >>>>>> [ 177.195312] radeon 0000:01:05.0: >>>>>> R_008014_GRBM_STATUS2=0x00111103 >>>>>> [ 177.203125] radeon 0000:01:05.0: >>>>>> R_000E50_SRBM_STATUS=0x20023040 >>>>>> [ 177.363281] radeon 0000:01:05.0: Wait for MC idle timedout ! >>>>>> [ 177.367187] radeon 0000:01:05.0: >>>>>> R_008020_GRBM_SOFT_RESET=0x00007FEE >>>>>> [ 177.390625] radeon 0000:01:05.0: >>>>>> R_008020_GRBM_SOFT_RESET=0x00000001 >>>>>> [ 177.414062] radeon 0000:01:05.0: >>>>>> R_008010_GRBM_STATUS=0xA0003030 >>>>>> [ 177.417968] radeon 0000:01:05.0: >>>>>> R_008014_GRBM_STATUS2=0x00000003 >>>>>> [ 177.425781] radeon 0000:01:05.0: >>>>>> R_000E50_SRBM_STATUS=0x2002B040 >>>>>> [ 177.433593] radeon 0000:01:05.0: GPU reset succeed >>>>>> [ 177.605468] radeon 0000:01:05.0: Wait for MC idle timedout ! >>>>>> [ 177.761718] radeon 0000:01:05.0: Wait for MC idle timedout ! >>>>>> [ 177.804687] radeon 0000:01:05.0: WB enabled >>>>>> [ 178.000000] [drm:r600_ring_test] *ERROR* radeon: ring test failed >>>>>> (scratch(0x8504)=0xCAFEDEAD) >>>>> After pinned ring in VRAM, it warned an ib test failure. It seems >>>>> something wrong with accessing through GTT. >>>>> >>>>> We dump gart table just after stopped cp, and compare gart table with >>>>> the dumped one just after r600_pcie_gart_enable, and don't find any >>>>> difference. >>>>> >>>>> Any idea? >>>>> >>>>>> [ 178.007812] [drm:r600_resume] *ERROR* r600 startup failed on >>>>>> resume >>>>>> [ 178.988281] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't >>>>>> schedule >>>>>> IB(5). >>>>>> [ 178.996093] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB ! >>>>>> [ 179.003906] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't >>>>>> schedule >>>>>> IB(6). >>>>>> ... >>>>> >>>>> >>>>> >>>>> Regards, >>>>> -- Chen Jie >>>>> >>>> >>>> >>>> >>> >> >> >> > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel