RE: [PATCH] drm/i915/dpt: Restrict shrinker to DPT objects not mapped

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Srinivas, Vidya
> Sent: 23 November 2024 10:14
> To: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Syrjala, Ville <ville.syrjala@xxxxxxxxx>;
> bgeffon@xxxxxxxxxx; Lee, Shawn C <shawn.c.lee@xxxxxxxxx>
> Subject: RE: [PATCH] drm/i915/dpt: Restrict shrinker to DPT objects not
> mapped
> 
> 
> 
> > -----Original Message-----
> > From: Srinivas, Vidya
> > Sent: 23 November 2024 09:39
> > To: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Syrjala, Ville
> > <ville.syrjala@xxxxxxxxx>; bgeffon@xxxxxxxxxx; Lee, Shawn C
> > <shawn.c.lee@xxxxxxxxx>
> > Subject: RE: [PATCH] drm/i915/dpt: Restrict shrinker to DPT objects
> > not mapped
> >
> >
> >
> > > -----Original Message-----
> > > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > > Sent: 23 November 2024 04:18
> > > To: Srinivas, Vidya <vidya.srinivas@xxxxxxxxx>
> > > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Syrjala, Ville
> > > <ville.syrjala@xxxxxxxxx>; bgeffon@xxxxxxxxxx; Lee, Shawn C
> > > <shawn.c.lee@xxxxxxxxx>
> > > Subject: Re: [PATCH] drm/i915/dpt: Restrict shrinker to DPT objects
> > > not mapped
> > >
> > > On Fri, Nov 22, 2024 at 04:37:41PM +0530, Vidya Srinivas wrote:
> > > > Restricting all DPT objects as unshrinkable was causing some
> > > > chromebooks to run out of memory causing DMA remap failures.
> > > > Thanks to Brian Geffon for the pointers on debug and suggesting
> > > > usage of !obj->mm.mapping
> > > >
> > > > Fixes: 43e2b37e2ab6 ("drm/i915/dpt: Make DPT object unshrinkable")
> > > >
> > > > Credits-to: Brian Geffon <bgeffon@xxxxxxxxxx>
> > > > Suggested-by: Ville Syrj_l_ <ville.syrjala@xxxxxxxxxxxxxxx>
> > > > Signed-off-by: Vidya Srinivas <vidya.srinivas@xxxxxxxxx>
> > > > ---
> > > >  drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > > b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > > index 3dc61cbd2e11..b155f0139d4e 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > > @@ -285,7 +285,7 @@ i915_gem_object_is_shrinkable(const struct
> > > > drm_i915_gem_object *obj)  {
> > > >  	/* TODO: make DPT shrinkable when it has no bound vmas */
> > > >  	return i915_gem_object_type_has(obj,
> > > I915_GEM_OBJECT_IS_SHRINKABLE) &&
> > > > -		!obj->is_dpt;
> > > > +		!(obj->is_dpt && obj->mm.mapping);
> > >
> > > AFAICS the mapping could only become NULL if the DPT got evicted
> > > from the GGTT, but if that can happen then I think the current code
> > > must still be busted wrt. suspend/resume even with the unshrinkable DPT
> obj.
> > >
> > > Looking at the vma suspend stuff I think the only way something
> > > could still be bound in the DPT during resume is if it was pinned
> > > during suspend. But we should be unpinning all FBs during suspend,
> > > so that's a bit weird in itself. Hmm, we do the unpinning from the
> > > cleanup_work so maybe that's still pending when we suspend and thus
> > > something is still pinned in the DPT. And indeed there is no
> > > flush_work()/etc. for that. So perhaps if we add that we could just
> > > revert the unshrinkable DPT
> > patch.
> > >
> > Hello Ville,
> > Thank you so much. Yes, Google team also insist on that as they worry
> > the DPT patch might just be a workaround for the real memory corruption.
> Sorry, just to be clear. With the DPT patch (43e2b37e2ab6 "drm/i915/dpt: Make
> DPT object unshrinkable"), Maybe its not that mapping became NULL. But
> since all DPT objs are unshrinkable, they are hitting low memory which is
> probably causing DMA remap failure when memory is trying to be cleared.
> "i915 0000:00:02.0: Failed to DMA remap 147457 pages"
> BUG: Bad page state in process chrome  pfn:14200 page dumped because: non-
> NULL mapping
> 
> So their suggestion initially was to try and use !obj->is_dpt || !obj-
> >mm.mapping as well so that shrinker could clear some more.
> However as you mentioned, if we could do something and revert the DPT patch
> entirely, it would be really very helpful
> 
> >
> > > Did we have a good way to reproduce the resume explosion? If not, I
> > > suppose we could try to confirm the lack of flush_work() as the
> > > culprit by simply sticking some kind of sleep() into the cleanup
> > > function to make sure it doesn't get a chance to run during suspend.
> > We can reproduce the original issue where during the suspend/resume,
> > the meteorlake chromebook goes for a reboot and we have the crash
> > captured from console-ramoops.
> For this, I meant we have a reproduction WITHOUT the DPT patch
> 43e2b37e2ab6.
> But, we don't have reproduction at Intel for the "DMA remap failed case"while
> having the unshrinkable DPT 43e2b37e2ab6 "drm/i915/dpt: Make DPT object
> unshrinkable"
> So, we could try reproducing the original issue and try with the pointers you
> have provided.
> >
> > >
> > > I also had this:
> > > https://patchwork.freedesktop.org/patch/503398/?series=108669&rev=1
> > > which would be good to have so that we can actully do the obvious
> > > flush_work(cleanup_work) instead of playing confusing tricks with
> > > the the commit_work.
> > >
> > Thank you so much. Will try this patch as well and update.

Hello Ville,
Tried your patch https://patchwork.freedesktop.org/patch/503398/?series=108669&rev=1
with reverted DPT patch but the original suspend/resume issue was still seen.

Just tried to check a bit more on the i915_ggtt_suspend_vm
If the VMA is bound (i915_vma_is_bound), it wasn't hitting __i915_vma_evict
Could that be any reason? Have pushed a RFC patch just to seek your opinion
https://patchwork.freedesktop.org/patch/625967/
Somehow with this patch, couldn't reproduce the suspend/resume issue (yet).
However, am not very sure of the consequences of these changes. Could you kindly
have a check and suggest please? Thank you so much.

Regards
Vidya

> >
> > Regards
> > Vidya
> >
> > > >  }
> > > >
> > > >  static inline bool
> > > > --
> > > > 2.34.1
> > >
> > > --
> > > Ville Syrjälä
> > > Intel




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux