On Tue, Dec 17, 2024 at 04:27:57PM -0800, Jessica Zhang wrote: > From: Abhinav Kumar <quic_abhinavk@xxxxxxxxxxx> > > There is no recovery mechanism in place yet to recover from mmu > faults for DPU. We can only prevent the faults by making sure there > is no misconfiguration. > > Rate-limit the snapshot capture for mmu faults to once per > msm_atomic_commit_tail() as that should be sufficient to capture > the snapshot for debugging otherwise there will be a lot of DPU > snapshots getting captured for the same fault which is redundant > and also might affect capturing even one snapshot accurately. > > Signed-off-by: Abhinav Kumar <quic_abhinavk@xxxxxxxxxxx> > Signed-off-by: Jessica Zhang <quic_jesszhan@xxxxxxxxxxx> > --- > drivers/gpu/drm/msm/msm_atomic.c | 2 ++ > drivers/gpu/drm/msm/msm_kms.c | 5 ++++- > drivers/gpu/drm/msm/msm_kms.h | 3 +++ > 3 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c > index 9c45d641b5212c11078ab38c13a519663d85e10a..9ad7eeb14d4336abd9d8a8eb1382bdddce80508a 100644 > --- a/drivers/gpu/drm/msm/msm_atomic.c > +++ b/drivers/gpu/drm/msm/msm_atomic.c > @@ -228,6 +228,8 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) > if (kms->funcs->prepare_commit) > kms->funcs->prepare_commit(kms, state); > > + kms->fault_snapshot_capture = 0; > + - Please move it before the prepare_commit(). - You are accessing the same variable from different threads / cores. There should be some kind of a sync barrier. > /* > * Push atomic updates down to hardware: > */ > diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c > index 78830e446355f77154fa21a5d107635bc88ba3ed..3327caf396d4fc905dc127f09515559c12666dc8 100644 > --- a/drivers/gpu/drm/msm/msm_kms.c > +++ b/drivers/gpu/drm/msm/msm_kms.c > @@ -168,7 +168,10 @@ static int msm_kms_fault_handler(void *arg, unsigned long iova, int flags, void > { > struct msm_kms *kms = arg; > > - msm_disp_snapshot_state(kms->dev); > + if (!kms->fault_snapshot_capture) { > + msm_disp_snapshot_state(kms->dev); > + kms->fault_snapshot_capture++; > + } > > return -ENOSYS; > } > diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h > index e60162744c669773b6e5aef824a173647626ab4e..3ac089e26e14b824567f3cd2c62f82a1b9ea9878 100644 > --- a/drivers/gpu/drm/msm/msm_kms.h > +++ b/drivers/gpu/drm/msm/msm_kms.h > @@ -128,6 +128,9 @@ struct msm_kms { > int irq; > bool irq_requested; > > + /* rate limit the snapshot capture to once per attach */ > + int fault_snapshot_capture; > + > /* mapper-id used to request GEM buffer mapped for scanout: */ > struct msm_gem_address_space *aspace; > > > -- > 2.34.1 > -- With best wishes Dmitry