On Fri, Jun 01, 2018 at 05:13:49PM +0800, Christian König wrote: > Am 01.06.2018 um 08:41 schrieb Huang Rui: > > After defer the execution of gfx/compute ib tests. However, at that time, the > > gfx already go into "mid state" of gfxoff. > > > > PWR_MISC_CNTL_STATUS: PWR_GFXOFF_STATUS field (2:1 bits) > > 0 = GFXOFF. > > 1 = Transition out of GFXOFF state. > > 2 = Not in GFXOFF. > > 3 = Transition into GFXOFF. > > > > If hit the mid state (1 or 3), the doorbell writing interrupt cannot wake up the > > gfx back successfully. And the field value is 1 when we issue the ib test at > > that, so we got the hang. This is the root cause that we encountered the issue. > > > > Meanwhile, we cannot set clockgating of GFX after gfx is already in "off" state. > > So here we should move the gfx powergating and gfxoff enabling behavior at the > > end of initialization behind ib test and clockgating. > > Mhm, that still looks like a only halve backed solution: > > 1. What prevents this bug from happening during "normal" IB submission > from userspace? > > 2. Shouldn't we poll the PWR_MISC_CNTL_STATUS register to make sure we > are not in any transition phase instead? > Yes, right. How about also add polling of PWR_MISC_CNTL_STATUS in amdgpu_ring_commit() behind set_wptr that confirm the status as "0" or "2"? Thanks, Ray