On Wed, 17 Jul 2024 at 01:43, Abhinav Kumar <quic_abhinavk@xxxxxxxxxxx> wrote: > > > > On 7/16/2024 2:50 PM, Rob Clark wrote: > > On Tue, Jul 16, 2024 at 2:45 PM Abhinav Kumar <quic_abhinavk@xxxxxxxxxxx> wrote: > >> > >> > >> > >> On 7/15/2024 12:51 PM, Rob Clark wrote: > >>> On Mon, Jul 1, 2024 at 12:43 PM Dmitry Baryshkov > >>> <dmitry.baryshkov@xxxxxxxxxx> wrote: > >>>> > >>>> On Fri, Jun 28, 2024 at 02:48:47PM GMT, Abhinav Kumar wrote: > >>>>> There is no recovery mechanism in place yet to recover from mmu > >>>>> faults for DPU. We can only prevent the faults by making sure there > >>>>> is no misconfiguration. > >>>>> > >>>>> Rate-limit the snapshot capture for mmu faults to once per > >>>>> msm_kms_init_aspace() as that should be sufficient to capture > >>>>> the snapshot for debugging otherwise there will be a lot of > >>>>> dpu snapshots getting captured for the same fault which is > >>>>> redundant and also might affect capturing even one snapshot > >>>>> accurately. > >>>> > >>>> Please squash this into the first patch. There is no need to add code > >>>> with a known defficiency. > >>>> > >>>> Also, is there a reason why you haven't used <linux/ratelimit.h> ? > >>> > >>> So, in some ways devcoredump is ratelimited by userspace needing to > >>> clear an existing devcore.. > >>> > >> > >> Yes, a new devcoredump device will not be created until the previous one > >> is consumed or times out but here I am trying to limit even the DPU > >> snapshot capture because DPU register space is really huge and the rate > >> at which smmu faults occur is quite fast that its causing instability > >> while snapshots are being captured. > >> > >>> What I'd suggest would be more useful is to limit the devcores to once > >>> per atomic update, ie. if display state hasn't changed, maybe an > >>> additional devcore isn't useful > >>> > >>> BR, > >>> -R > >>> > >> > >> By display state change, do you mean like the checks we have in > >> drm_atomic_crtc_needs_modeset()? > >> > >> OR do you mean we need to cache the previous (currently picked up by hw) > >> state and trigger a new devcores if the new state is different by > >> comparing more things? > >> > >> This will help to reduce the snapshots to unique frame updates but I do > >> not think it will reduce the rate enough for the case where DPU did not > >> recover from the previous fault. > > > > I was thinking the easy thing, of just resetting the counter in > > msm_atomic_commit_tail().. I suppose we could be clever filter out > > updates that only change scanout address. Or hash the atomic state > > and only generate devcoredumps for unique states. But I'm not sure > > how over-complicated we should make this. > > > > BR, > > -R > > Its a good idea actually and I would also like to keep it simple :) > > One question, is it okay to assume that all compositors will only issue > unique commits? Because we are assuming thats the case with resetting > the counter in msm_atomic_commit_tail(). The compositors use drm_mode_atomic_ioctl() which allocates a new state each time. It is impossible to re-submit the same drm_atomic_state from userspace. -- With best wishes Dmitry