Re: [PATCH 4/7] drm/ttm: move LRU walk defines into new internal header

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

On Wed, 2024-10-02 at 13:32 +0200, Christian König wrote:
> Ah, yes sorry totally forgotten about that.
> 
> Give me till Friday to swap everything back into my head again.
> 
> Christian.

Thanks!

Once we agree on a direction MBrost is ready to do an in-depth review.

/Thomas


> 
> Am 02.10.24 um 13:30 schrieb Thomas Hellström:
> > Hi, Christian,
> > 
> > Ping? Can i get an ack to proceed with this?
> > 
> > Thanks,
> > Thomas
> > 
> > 
> > 
> > On Wed, 2024-09-18 at 14:57 +0200, Thomas Hellström wrote:
> > > Sima, Christian
> > > 
> > > I've updated the shrinker series now with a guarded for_each
> > > macro
> > > instead:
> > > 
> > > https://patchwork.freedesktop.org/patch/614514/?series=131815&rev=9
> > > 
> > > (Note I forgot to remove the export of the previous LRU walker).
> > > 
> > >   so the midlayer argument is now not an issue anymore. The
> > > cleanup.h
> > > guard provides some additional protection against drivers exiting
> > > the
> > > LRU loop early.
> > > 
> > > So remaining is the question whether the driver is allowed to
> > > discard
> > > a
> > > suggested bo to shrink from TTM.
> > > 
> > > Arguments for:
> > > 
> > > 1) Not allowing that would require teaching TTM about purgeable
> > > objects.
> > > 2) Devices who need the blitter during shrinking would want to
> > > punt
> > > runtime_pm_get() to kswapd to avoid sleeping direct reclaim.
> > > 3) If those devices end up blitting (LNL) to be able to shrink,
> > > they
> > > would want to punt waiting for the fence to signal to kswapd to
> > > avoid
> > > waiting in direct reclaim.
> > > 4) It looks like we need to resort to folio_trylock in the shmem
> > > backup
> > > backend when shrinking is called for gfp_t = GFP_NOFS. A failing
> > > trylock will require a new bo.
> > > 
> > > Arguments against:
> > > None really. I thought the idea of demidlayering would be to
> > > allow
> > > the
> > > driver more freedom.
> > > 
> > > So any feedback appreciated. If that is found acceptable we can
> > > proceed
> > > with reviewing this patch and also with the shrinker series.
> > > 
> > > Thanks,
> > > Thomas
> > > 
> > > 
> > > On Mon, 2024-09-02 at 13:07 +0200, Daniel Vetter wrote:
> > > > On Wed, Aug 28, 2024 at 02:20:34PM +0200, Christian König
> > > > wrote:
> > > > > Am 27.08.24 um 19:53 schrieb Daniel Vetter:
> > > > > > On Tue, Aug 27, 2024 at 06:52:13PM +0200, Daniel Vetter
> > > > > > wrote:
> > > > > > > On Thu, Aug 22, 2024 at 03:19:29PM +0200, Christian König
> > > > > > > wrote:
> > > > > > > > Completely agree that this is complicated, but I still
> > > > > > > > don't
> > > > > > > > see the need
> > > > > > > > for it.
> > > > > > > > 
> > > > > > > > Drivers just need to use pm_runtime_get_if_in_use()
> > > > > > > > inside
> > > > > > > > the shrinker and
> > > > > > > > postpone all hw activity until resume.
> > > > > > > Not good enough, at least long term I think. Also
> > > > > > > postponing
> > > > > > > hw
> > > > > > > activity
> > > > > > > to resume doesn't solve the deadlock issue, if you still
> > > > > > > need
> > > > > > > to grab ttm
> > > > > > > locks on resume.
> > > > > > Pondered this specific aspect some more, and I think you
> > > > > > still
> > > > > > have a race
> > > > > > here (even if you avoid the deadlock): If the condiditional
> > > > > > rpm_get call
> > > > > > fails there's no guarantee that the device will
> > > > > > suspend/resume
> > > > > > and clean
> > > > > > up the GART mapping.
> > > > > Well I think we have a major disconnect here. When the device
> > > > > is
> > > > > powered
> > > > > down there is no GART mapping to clean up any more.
> > > > > 
> > > > > In other words GART is a table in local memory (VRAM) when
> > > > > the
> > > > > device is
> > > > > powered down this table is completely destroyed. Any BO which
> > > > > was
> > > > > mapped
> > > > > inside this table is now not mapped any more.
> > > > > 
> > > > > So when the shrinker wants to evict a BO which is marked as
> > > > > mapped
> > > > > to GART
> > > > > and the device is powered down we just skip the GART
> > > > > unmapping
> > > > > part
> > > > > because
> > > > > that has already implicitly happened during power down.
> > > > > 
> > > > > Before mapping any BO into the GART again we power the GPU up
> > > > > through the
> > > > > runtime PM calls. And while powering it up again the GART is
> > > > > restored.
> > > > My point is that you can't tell whether the device will power
> > > > down
> > > > or
> > > > not,
> > > > you can only tell whether there's a chance it might be powering
> > > > down
> > > > and
> > > > so you can't get at the rpm reference without deadlock issues.
> > > > 
> > > > > > The race gets a bit smaller if you use
> > > > > > pm_runtime_get_if_active(), but even then you might catch
> > > > > > it
> > > > > > right when
> > > > > > resume almost finished.
> > > > > What race are you talking about?
> > > > > 
> > > > > The worst thing which could happen is that we restore a GART
> > > > > entry
> > > > > which
> > > > > isn't needed any more, but that is pretty much irrelevant
> > > > > since
> > > > > we
> > > > > only
> > > > > clear them to avoid some hw bugs.
> > > > The race I'm seeing is where you thought the GART entry is not
> > > > issue,
> > > > tossed an object, but the device didn't suspend, so might still
> > > > use
> > > > it.
> > > > 
> > > > I guess if we're clearly separating the sw allocation of the
> > > > TTM_TT
> > > > with
> > > > the physical entries in the GART that should all work, but
> > > > feels a
> > > > bit
> > > > tricky. The race I've seen is essentially these two getting out
> > > > of
> > > > sync.
> > > > 
> > > > So maybe it was me who's stuck.
> > > > 
> > > > What I wonder is whether it works in practice, since on the
> > > > restore
> > > > side
> > > > you need to get some locks to figure out which gart mappings
> > > > exist
> > > > and
> > > > need restoring. And that's the same locks as the shrinker needs
> > > > to
> > > > figure
> > > > out whether it might need to reap a gart mapping.
> > > > 
> > > > Or do you just copy the gart entries over and restore them
> > > > exactly
> > > > as-is,
> > > > so that there's no shared locks?
> > > > 
> > > > > > That means we'll have ttm bo hanging around with GART
> > > > > > allocations/mappings
> > > > > > which aren't actually valid anymore (since they might
> > > > > > escape
> > > > > > the
> > > > > > cleanup
> > > > > > upon resume due to the race). That doesn't feel like a
> > > > > > solid
> > > > > > design
> > > > > > either.
> > > > > I'm most likely missing something, but I'm really scratching
> > > > > my
> > > > > head where
> > > > > you see a problem here.
> > > > I guess one issue is that at least traditionally, igfx drivers
> > > > have
> > > > nested
> > > > runtime pm within dma_resv lock. And dgpu drivers the other way
> > > > round.
> > > > Which is a bit awkward if you're trying for common code.
> > > > 
> > > > Cheers, Sima
> 





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux