> -----Original Message----- > From: Srinivas, Vidya > Sent: 23 November 2024 10:14 > To: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Syrjala, Ville <ville.syrjala@xxxxxxxxx>; > bgeffon@xxxxxxxxxx; Lee, Shawn C <shawn.c.lee@xxxxxxxxx> > Subject: RE: [PATCH] drm/i915/dpt: Restrict shrinker to DPT objects not > mapped > > > > > -----Original Message----- > > From: Srinivas, Vidya > > Sent: 23 November 2024 09:39 > > To: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Syrjala, Ville > > <ville.syrjala@xxxxxxxxx>; bgeffon@xxxxxxxxxx; Lee, Shawn C > > <shawn.c.lee@xxxxxxxxx> > > Subject: RE: [PATCH] drm/i915/dpt: Restrict shrinker to DPT objects > > not mapped > > > > > > > > > -----Original Message----- > > > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > > Sent: 23 November 2024 04:18 > > > To: Srinivas, Vidya <vidya.srinivas@xxxxxxxxx> > > > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Syrjala, Ville > > > <ville.syrjala@xxxxxxxxx>; bgeffon@xxxxxxxxxx; Lee, Shawn C > > > <shawn.c.lee@xxxxxxxxx> > > > Subject: Re: [PATCH] drm/i915/dpt: Restrict shrinker to DPT objects > > > not mapped > > > > > > On Fri, Nov 22, 2024 at 04:37:41PM +0530, Vidya Srinivas wrote: > > > > Restricting all DPT objects as unshrinkable was causing some > > > > chromebooks to run out of memory causing DMA remap failures. > > > > Thanks to Brian Geffon for the pointers on debug and suggesting > > > > usage of !obj->mm.mapping > > > > > > > > Fixes: 43e2b37e2ab6 ("drm/i915/dpt: Make DPT object unshrinkable") > > > > > > > > Credits-to: Brian Geffon <bgeffon@xxxxxxxxxx> > > > > Suggested-by: Ville Syrj_l_ <ville.syrjala@xxxxxxxxxxxxxxx> > > > > Signed-off-by: Vidya Srinivas <vidya.srinivas@xxxxxxxxx> > > > > --- > > > > drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h > > > > b/drivers/gpu/drm/i915/gem/i915_gem_object.h > > > > index 3dc61cbd2e11..b155f0139d4e 100644 > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h > > > > @@ -285,7 +285,7 @@ i915_gem_object_is_shrinkable(const struct > > > > drm_i915_gem_object *obj) { > > > > /* TODO: make DPT shrinkable when it has no bound vmas */ > > > > return i915_gem_object_type_has(obj, > > > I915_GEM_OBJECT_IS_SHRINKABLE) && > > > > - !obj->is_dpt; > > > > + !(obj->is_dpt && obj->mm.mapping); > > > > > > AFAICS the mapping could only become NULL if the DPT got evicted > > > from the GGTT, but if that can happen then I think the current code > > > must still be busted wrt. suspend/resume even with the unshrinkable DPT > obj. > > > > > > Looking at the vma suspend stuff I think the only way something > > > could still be bound in the DPT during resume is if it was pinned > > > during suspend. But we should be unpinning all FBs during suspend, > > > so that's a bit weird in itself. Hmm, we do the unpinning from the > > > cleanup_work so maybe that's still pending when we suspend and thus > > > something is still pinned in the DPT. And indeed there is no > > > flush_work()/etc. for that. So perhaps if we add that we could just > > > revert the unshrinkable DPT > > patch. > > > > > Hello Ville, > > Thank you so much. Yes, Google team also insist on that as they worry > > the DPT patch might just be a workaround for the real memory corruption. > Sorry, just to be clear. With the DPT patch (43e2b37e2ab6 "drm/i915/dpt: Make > DPT object unshrinkable"), Maybe its not that mapping became NULL. But > since all DPT objs are unshrinkable, they are hitting low memory which is > probably causing DMA remap failure when memory is trying to be cleared. > "i915 0000:00:02.0: Failed to DMA remap 147457 pages" > BUG: Bad page state in process chrome pfn:14200 page dumped because: non- > NULL mapping > > So their suggestion initially was to try and use !obj->is_dpt || !obj- > >mm.mapping as well so that shrinker could clear some more. > However as you mentioned, if we could do something and revert the DPT patch > entirely, it would be really very helpful > > > > > > Did we have a good way to reproduce the resume explosion? If not, I > > > suppose we could try to confirm the lack of flush_work() as the > > > culprit by simply sticking some kind of sleep() into the cleanup > > > function to make sure it doesn't get a chance to run during suspend. > > We can reproduce the original issue where during the suspend/resume, > > the meteorlake chromebook goes for a reboot and we have the crash > > captured from console-ramoops. > For this, I meant we have a reproduction WITHOUT the DPT patch > 43e2b37e2ab6. > But, we don't have reproduction at Intel for the "DMA remap failed case"while > having the unshrinkable DPT 43e2b37e2ab6 "drm/i915/dpt: Make DPT object > unshrinkable" > So, we could try reproducing the original issue and try with the pointers you > have provided. > > > > > > > > I also had this: > > > https://patchwork.freedesktop.org/patch/503398/?series=108669&rev=1 > > > which would be good to have so that we can actully do the obvious > > > flush_work(cleanup_work) instead of playing confusing tricks with > > > the the commit_work. > > > > > Thank you so much. Will try this patch as well and update. Hello Ville, Tried your patch https://patchwork.freedesktop.org/patch/503398/?series=108669&rev=1 with reverted DPT patch but the original suspend/resume issue was still seen. Just tried to check a bit more on the i915_ggtt_suspend_vm If the VMA is bound (i915_vma_is_bound), it wasn't hitting __i915_vma_evict Could that be any reason? Have pushed a RFC patch just to seek your opinion https://patchwork.freedesktop.org/patch/625967/ Somehow with this patch, couldn't reproduce the suspend/resume issue (yet). However, am not very sure of the consequences of these changes. Could you kindly have a check and suggest please? Thank you so much. Regards Vidya > > > > Regards > > Vidya > > > > > > } > > > > > > > > static inline bool > > > > -- > > > > 2.34.1 > > > > > > -- > > > Ville Syrjälä > > > Intel