Re: [PATCH 4/7] drm/ttm: move LRU walk defines into new internal header

Christian König <christian.koenig@xxxxxxx> · Wed, 21 Aug 2024 11:48:27 +0200

    Am 21.08.24 um 10:57 schrieb Thomas Hellström:

      On Wed, 2024-08-21 at 10:14 +0200, Christian König wrote:

        Am 20.08.24 um 18:00 schrieb Thomas Hellström:

            Or why exactly should shrinking fail?

          A common example would be not having runtime pm and the particular
bo
needs it to unbind, we want to try the next bo. Example: i915 GGTT
bound bos and Lunar Lake PL_TT bos.

        WHAT? So you basically block shrinking BOs because you can't unbind
them 
because the device is powered down?

I would say that this is a serious NO-GO. It basically means that 
powered down devices can lock down system memory for undefined amount
of 
time.

In other words an application can allocate memory, map it into GGTT
and 
then suspend or even get killed and we are not able to recover the 
memory because there is no activity on the GPU any more?

That really sounds like a bug in the driver design to me.

      It's bad but it's not as bad as it sounds.

Problem is we can't wake up during direct reclaim IIRC due to runtime
pm lockdep violations, but we can and do fire up a thread to wake the
device and after the wakeup delay have subsequent shrink calls succeed,
or punt to kswapd or the oom handler.

    Yeah that is obvious. The runtime PM is an interface designed to be
    used from a very high level IOCTL/system call.

    And delegating that from a shrinker to a worker is not valid as far
    as I can see, instead of reducing the memory pressure the shrinker
    would then increase it.

      I think that's an orthogonal discussion, though. There are other
reasons shrinking might fail, like the bo being busy in direct reclaim
(shouldn't wait for idle there but ok in kswapd), Other points of
failure is ofc shmem radix tree allocations (not seen one yet, though)
which might succeed with a smaller bo.
(Not saying, though, that there isn't more to be done with the xe
runtime pm implementation).

    I don't think that argumentation is valid.

    When a BO is locked then that it is ok to not shrink it, but TTM
    should be able to determine all those prerequisites.

    In other words the idea of a function returning a BO to the driver
    is that the driver is obligated to shrink that one.

    That other necessary allocation can fail like shmen for example is
    obvious as well, but that's why we discussed to allow shrinking BOs
    partially as well.

    And I really don't think this discussion is orthogonal. We are
    basically discussing what drivers should do and not should do. And
    as far as I can see the requirement to expose the LRUs to drivers
    comes up only because the driver wants to do something it shouldn't.

          And again, all other drm bo shrinkers do this. We just want to do
the
same.

        Do you have pointers?

      As Sima said, this is complicated but not beyond comprehension: i915
https://elixir.bootlin.com/linux/v6.11-rc4/source/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c#L317

    As far as I can tell what i915 does here is extremely questionable.

        if (sc->nr_scanned < sc->nr_to_scan &&
    current_is_kswapd()) {

    ....

            with_intel_runtime_pm(&i915->runtime_pm, wakeref) {

    with_intel_runtime_pm() then calls pm_runtime_get_sync().

    So basically the i915 shrinker assumes that when called from
    kswapd() that it can synchronously wait for runtime PM to power up
    the device again.

    As far as I can tell that means that a device driver makes strong
    and completely undocumented assumptions how kswapd works internally.

      msm:
https://elixir.bootlin.com/linux/v6.11-rc4/source/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c#L317
which uses
https://elixir.bootlin.com/linux/v6.11-rc4/source/drivers/gpu/drm/drm_gem.c#L1426
that is very similar in structure to what I implemented for TTM.

Panfrost: (although only purgeable objects AFAICT).
https://elixir.bootlin.com/linux/v6.11-rc4/source/drivers/gpu/drm/drm_gem.c#L1426

    From skimming over the source only MSM actually seems to use this
    and the criteria for rejecting shrinking is everything TTM should
    know, e.g. if a BO is pinned, idle etc...

              If we bump LRU we could end up with infinite loops.
So IMO we need to be able to loop. I don't really care wether
we do
this as an explicit loop or whether we use the LRU walker, but
I
think
from a maintainability point-of-view it is better to keep LRU
walking
in a single place.

If we return an unlocked object, we'd need to refcount and drop
the
lru
lock, but maybe that's not a bad thing.

But what's the main drawback of exporting the existing helper.

            Well that we re-creates exactly the mid-layer mess I worked so
hard
to
remove from TTM.

          It doesn't IMO. I agree the first attempt did. This affects only
the
LRU iteration itself and I'm even fine to get rid of the callback
using
a for_ macro.

        Well, I mean using a for_each approach is objectively better than
having 
a callback and a state bag.

But the fundamental question is if drivers are allowed to reject 
shrinking. And I think the answer is no, they need to be designed in
a 
way where shrinking is always possible.

      Rejects can be out of our control, due to anticipated deadlocks, oom
and deferring to kswapd.

        What can be that we can't get the necessary locks to evict and object
(because it's about to be used etc...), but that are the per-
requisites 
TTM should be checking.

              In any case, I don't think TTM should enforce a different way
of
shrinking by the means of a severely restricted helper?

            Well, as far as I can see that is exactly what TTM should do.

I mean the main advantage to make a common component is to
enforce
correct behavior.

          But if all other drivers don't agree this as correct behavior and
instead want to keep behavior that is proven to work, that's a dead
end.

        Well no, even if all drivers agree to (for example) drop security 
precautions it's still not something acceptable.

And same thing here, if we block shrinking because drivers think they
want their runtime PM implemented in a certain way then upstream
needs 
to block this and push back.

As far as I can see it's mandatory to have shrinkers not depend on 
runtime PM, cause otherwise you run into resources handling which 
depends on the well behavior of userspace and that in turn in
something 
we can't allow.

      Please see the above explanation for runtime pm, and for the record I
agree that enforcing disallowed or security violations is a completely
valid thing.

    Putting the TTM issue aside as far as I can tell what i915 is
    extremely questionable and doing the same thing in XE is most likely
    not something we should allow.

    Regards,

    Christian.

/Thomas

        Regards,
Christian.

          /Thomas

            Regards,
Christian.

              /Thomas