On Tue, Jan 07, 2025 at 05:34:20PM -0800, Abhinav Kumar wrote: > > > On 12/18/2024 1:33 PM, Jessica Zhang wrote: > > > > > > On 12/18/2024 3:20 AM, Dmitry Baryshkov wrote: > > > On Tue, Dec 17, 2024 at 04:27:57PM -0800, Jessica Zhang wrote: > > > > From: Abhinav Kumar <quic_abhinavk@xxxxxxxxxxx> > > > > > > > > There is no recovery mechanism in place yet to recover from mmu > > > > faults for DPU. We can only prevent the faults by making sure there > > > > is no misconfiguration. > > > > > > > > Rate-limit the snapshot capture for mmu faults to once per > > > > msm_atomic_commit_tail() as that should be sufficient to capture > > > > the snapshot for debugging otherwise there will be a lot of DPU > > > > snapshots getting captured for the same fault which is redundant > > > > and also might affect capturing even one snapshot accurately. > > > > > > > > Signed-off-by: Abhinav Kumar <quic_abhinavk@xxxxxxxxxxx> > > > > Signed-off-by: Jessica Zhang <quic_jesszhan@xxxxxxxxxxx> > > > > --- > > > > drivers/gpu/drm/msm/msm_atomic.c | 2 ++ > > > > drivers/gpu/drm/msm/msm_kms.c | 5 ++++- > > > > drivers/gpu/drm/msm/msm_kms.h | 3 +++ > > > > 3 files changed, 9 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/msm/msm_atomic.c > > > > b/drivers/gpu/drm/msm/msm_atomic.c > > > > index 9c45d641b5212c11078ab38c13a519663d85e10a..9ad7eeb14d4336abd9d8a8eb1382bdddce80508a > > > > 100644 > > > > --- a/drivers/gpu/drm/msm/msm_atomic.c > > > > +++ b/drivers/gpu/drm/msm/msm_atomic.c > > > > @@ -228,6 +228,8 @@ void msm_atomic_commit_tail(struct > > > > drm_atomic_state *state) > > > > if (kms->funcs->prepare_commit) > > > > kms->funcs->prepare_commit(kms, state); > > > > + kms->fault_snapshot_capture = 0; > > > > + > > > > > > - Please move it before the prepare_commit(). > > > - You are accessing the same variable from different threads / cores. > > > There should be some kind of a sync barrier. > > > > Hi Dmitry, > > > > Ack, will add a lock for the snapshot capture counter. > > > > Thanks, > > > > Jessica Zhang > > > > We cannot have a mutex lock because msm_kms_fault_handler cannot hold a > mutex. So we need an atomic variable in this case. Or a spinlock. > > > > > > > > /* > > > > * Push atomic updates down to hardware: > > > > */ > > > > diff --git a/drivers/gpu/drm/msm/msm_kms.c > > > > b/drivers/gpu/drm/msm/msm_kms.c > > > > index 78830e446355f77154fa21a5d107635bc88ba3ed..3327caf396d4fc905dc127f09515559c12666dc8 > > > > 100644 > > > > --- a/drivers/gpu/drm/msm/msm_kms.c > > > > +++ b/drivers/gpu/drm/msm/msm_kms.c > > > > @@ -168,7 +168,10 @@ static int msm_kms_fault_handler(void *arg, > > > > unsigned long iova, int flags, void > > > > { > > > > struct msm_kms *kms = arg; > > > > - msm_disp_snapshot_state(kms->dev); > > > > + if (!kms->fault_snapshot_capture) { > > > > + msm_disp_snapshot_state(kms->dev); > > > > + kms->fault_snapshot_capture++; > > > > + } > > > > return -ENOSYS; > > > > } > > > > diff --git a/drivers/gpu/drm/msm/msm_kms.h > > > > b/drivers/gpu/drm/msm/msm_kms.h > > > > index e60162744c669773b6e5aef824a173647626ab4e..3ac089e26e14b824567f3cd2c62f82a1b9ea9878 > > > > 100644 > > > > --- a/drivers/gpu/drm/msm/msm_kms.h > > > > +++ b/drivers/gpu/drm/msm/msm_kms.h > > > > @@ -128,6 +128,9 @@ struct msm_kms { > > > > int irq; > > > > bool irq_requested; > > > > + /* rate limit the snapshot capture to once per attach */ > > > > + int fault_snapshot_capture; > > > > + > > > > /* mapper-id used to request GEM buffer mapped for scanout: */ > > > > struct msm_gem_address_space *aspace; > > > > > > > > -- > > > > 2.34.1 > > > > > > > > > > -- > > > With best wishes > > > Dmitry > > -- With best wishes Dmitry