Am 01.06.2018 um 11:29 schrieb Huang Rui: > On Fri, Jun 01, 2018 at 05:13:49PM +0800, Christian König wrote: >> Am 01.06.2018 um 08:41 schrieb Huang Rui: >>> After defer the execution of gfx/compute ib tests. However, at that time, the >>> gfx already go into "mid state" of gfxoff. >>> >>> PWR_MISC_CNTL_STATUS: PWR_GFXOFF_STATUS field (2:1 bits) >>> 0 = GFXOFF. >>> 1 = Transition out of GFXOFF state. >>> 2 = Not in GFXOFF. >>> 3 = Transition into GFXOFF. >>> >>> If hit the mid state (1 or 3), the doorbell writing interrupt cannot wake up the >>> gfx back successfully. And the field value is 1 when we issue the ib test at >>> that, so we got the hang. This is the root cause that we encountered the issue. >>> >>> Meanwhile, we cannot set clockgating of GFX after gfx is already in "off" state. >>> So here we should move the gfx powergating and gfxoff enabling behavior at the >>> end of initialization behind ib test and clockgating. >> Mhm, that still looks like a only halve backed solution: >> >> 1. What prevents this bug from happening during "normal" IB submission >> from userspace? >> >> 2. Shouldn't we poll the PWR_MISC_CNTL_STATUS register to make sure we >> are not in any transition phase instead? >> > Yes, right. How about also add polling of PWR_MISC_CNTL_STATUS in > amdgpu_ring_commit() behind set_wptr that confirm the status as "0" or "2"? You could add an end_use() callback for that, but I think we rather need to do this in gfx_v9_0_ring_set_wptr_gfx() before we write the doorbell. Christian. > > Thanks, > Ray