On 08/23/2019 03:58 PM, Chris Wilson wrote: > Quoting Xiaolin Zhang (2019-08-23 07:57:31) >> vgpu ppgtt notification was split into 2 steps, the first step is to >> update PVINFO's pdp register and then write PVINFO's g2v_notify register >> with action code to tirgger ppgtt notification to GVT side. >> >> currently these steps were not atomic operations due to no any protection, >> so it is easy to enter race condition state during the MTBF, stress and >> IGT test to cause GPU hang. >> >> the solution is to add a lock to make vgpu ppgtt notication as atomic >> operation. >> >> Cc: stable@xxxxxxxxxxxxxxx >> Signed-off-by: Xiaolin Zhang <xiaolin.zhang@xxxxxxxxx> >> --- >> drivers/gpu/drm/i915/i915_drv.h | 1 + >> drivers/gpu/drm/i915/i915_gem_gtt.c | 4 ++++ >> drivers/gpu/drm/i915/i915_vgpu.c | 1 + >> 3 files changed, 6 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h >> index eb31c16..32e17c4 100644 >> --- a/drivers/gpu/drm/i915/i915_drv.h >> +++ b/drivers/gpu/drm/i915/i915_drv.h >> @@ -961,6 +961,7 @@ struct i915_frontbuffer_tracking { >> }; >> >> struct i915_virtual_gpu { >> + struct mutex lock; >> bool active; >> u32 caps; >> }; >> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c >> index 2cd2dab..ff0b178 100644 >> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c >> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c >> @@ -833,6 +833,8 @@ static int gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create) >> enum vgt_g2v_type msg; >> int i; >> >> + mutex_lock(&dev_priv->vgpu.lock); >> + >> if (create) >> atomic_inc(px_used(ppgtt->pd)); /* never remove */ >> else >> @@ -860,6 +862,8 @@ static int gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create) >> >> I915_WRITE(vgtif_reg(g2v_notify), msg); >> > How do you know the operation is complete and it is now safe to > overwrite the data regs? > -Chris > by design, the data reg value is copied out to use, so as long as the action and data is operated together, how long the operation is not a issue and it is safe to overwrite the data regs with new action next time. -BRs, Xiaolin