Re: [PATCH 4/7] drm/ttm: move LRU walk defines into new internal header

Christian König <christian.koenig@xxxxxxx> · Wed, 21 Aug 2024 10:14:34 +0200

    Am 20.08.24 um 18:00 schrieb Thomas Hellström:

        Or why exactly should shrinking fail?

      A common example would be not having runtime pm and the particular bo
needs it to unbind, we want to try the next bo. Example: i915 GGTT
bound bos and Lunar Lake PL_TT bos.

    WHAT? So you basically block shrinking BOs because you can't unbind
    them because the device is powered down?

    I would say that this is a serious NO-GO. It basically means that
    powered down devices can lock down system memory for undefined
    amount of time.

    In other words an application can allocate memory, map it into GGTT
    and then suspend or even get killed and we are not able to recover
    the memory because there is no activity on the GPU any more?

    That really sounds like a bug in the driver design to me.

      And again, all other drm bo shrinkers do this. We just want to do the
same.

    Do you have pointers?

          If we bump LRU we could end up with infinite loops.
So IMO we need to be able to loop. I don't really care wether we do
this as an explicit loop or whether we use the LRU walker, but I
think
from a maintainability point-of-view it is better to keep LRU
walking
in a single place.

If we return an unlocked object, we'd need to refcount and drop the
lru
lock, but maybe that's not a bad thing.

But what's the main drawback of exporting the existing helper.

        Well that we re-creates exactly the mid-layer mess I worked so hard
to 
remove from TTM.

      It doesn't IMO. I agree the first attempt did. This affects only the
LRU iteration itself and I'm even fine to get rid of the callback using
a for_ macro.

    Well, I mean using a for_each approach is objectively better than
    having a callback and a state bag.

    But the fundamental question is if drivers are allowed to reject
    shrinking. And I think the answer is no, they need to be designed in
    a way where shrinking is always possible.

    What can be that we can't get the necessary locks to evict and
    object (because it's about to be used etc...), but that are the
    per-requisites TTM should be checking.

          In any case, I don't think TTM should enforce a different way of
shrinking by the means of a severely restricted helper?

        Well, as far as I can see that is exactly what TTM should do.

I mean the main advantage to make a common component is to enforce 
correct behavior.

      But if all other drivers don't agree this as correct behavior and
instead want to keep behavior that is proven to work, that's a dead
end.

    Well no, even if all drivers agree to (for example) drop security
    precautions it's still not something acceptable.

    And same thing here, if we block shrinking because drivers think
    they want their runtime PM implemented in a certain way then
    upstream needs to block this and push back.

    As far as I can see it's mandatory to have shrinkers not depend on
    runtime PM, cause otherwise you run into resources handling which
    depends on the well behavior of userspace and that in turn in
    something we can't allow.

    Regards,

    Christian. 

/Thomas

        Regards,
Christian.

          /Thomas