Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

"Koenig, Christian" <Christian.Koenig@xxxxxxx> · Thu, 16 May 2019 14:08:38 +0000

Am 16.05.19 um 14:28 schrieb Daniel Vetter:
> [CAUTION: External Email]
>
> On Thu, May 16, 2019 at 09:25:31AM +0200, Christian König wrote:
>> Am 16.05.19 um 09:16 schrieb Koenig, Christian:
>>> Am 16.05.19 um 04:29 schrieb Kenny Ho:
>>>> [CAUTION: External Email]
>>>>
>>>> On Wed, May 15, 2019 at 5:26 PM Welty, Brian <brian.welty@xxxxxxxxx> wrote:
>>>>> On 5/9/2019 2:04 PM, Kenny Ho wrote:
>>>>>> There are four control file types,
>>>>>> stats (ro) - display current measured values for a resource
>>>>>> max (rw) - limits for a resource
>>>>>> default (ro, root cgroup only) - default values for a resource
>>>>>> help (ro, root cgroup only) - help string for a resource
>>>>>>
>>>>>> Each file is multi-lined with one entry/line per drm device.
>>>>> Multi-line is correct for multiple devices, but I believe you need
>>>>> to use a KEY to denote device for both your set and get routines.
>>>>> I didn't see your set functions reading a key, or the get functions
>>>>> printing the key in output.
>>>>> cgroups-v2 conventions mention using KEY of major:minor, but I think
>>>>> you can use drm_minor as key?
>>>> Given this controller is specific to the drm kernel subsystem which
>>>> uses minor to identify drm device,
>>> Wait a second, using the DRM minor is a good idea in the first place.
>> Well that should have read "is not a good idea"..
> What else should we use?

Well what does for example udev uses to identify a device?

>> Christian.
>>
>>> I have a test system with a Vega10 and a Vega20. Which device gets which
>>> minor is not stable, but rather defined by the scan order of the PCIe bus.
>>>
>>> Normally the scan order is always the same, but adding or removing
>>> devices or delaying things just a little bit during init is enough to
>>> change this.
>>>
>>> We need something like the Linux sysfs location or similar to have a
>>> stable implementation.
> You can go from sysfs location to drm class directory (in sysfs) and back.
> That means if you care you need to walk sysfs yourself a bit, but using
> the drm minor isn't a blocker itself.

Yeah, agreed that userspace could do this. But I think if there is an of 
hand alternative we should use this instead.

> One downside with the drm minor is that it's pretty good nonsense once you
> have more than 64 gpus though, due to how we space render and legacy nodes
> in the minor ids :-)

Ok, another good reason to at least not use the minor=linenum approach.

Christian.

> -Daniel
>>> Regards,
>>> Christian.
>>>
>>>>     I don't see a need to complicate
>>>> the interfaces more by having major and a key.  As you can see in the
>>>> examples below, the drm device minor corresponds to the line number.
>>>> I am not sure how strict cgroup upstream is about the convention but I
>>>> am hoping there are flexibility here to allow for what I have
>>>> implemented.  There are a couple of other things I have done that is
>>>> not described in the convention: 1) inclusion of read-only *.help file
>>>> at the root cgroup, 2) use read-only (which I can potentially make rw)
>>>> *.default file instead of having a default entries (since the default
>>>> can be different for different devices) inside the control files (this
>>>> way, the resetting of cgroup values for all the drm devices, can be
>>>> done by a simple 'cp'.)
>>>>
>>>>>> Usage examples:
>>>>>> // set limit for card1 to 1GB
>>>>>> sed -i '2s/.*/1073741824/' /sys/fs/cgroup/<cgroup>/drm.buffer.total.max
>>>>>>
>>>>>> // set limit for card0 to 512MB
>>>>>> sed -i '1s/.*/536870912/' /sys/fs/cgroup/<cgroup>/drm.buffer.total.max
>>>>>>     /** @file drm_gem.c
>>>>>> @@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
>>>>>>          obj->handle_count = 0;
>>>>>>          obj->size = size;
>>>>>>          drm_vma_node_reset(&obj->vma_node);
>>>>>> +
>>>>>> +     obj->drmcgrp = get_drmcgrp(current);
>>>>>> +     drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size);
>>>>> Why do the charging here?
>>>>> There is no backing store yet for the buffer, so this is really tracking something akin to allowed virtual memory for GEM objects?
>>>>> Is this really useful for an administrator to control?
>>>>> Isn't the resource we want to control actually the physical backing store?
>>>> That's correct.  This is just the first level of control since the
>>>> backing store can be backed by different type of memory.  I am in the
>>>> process of adding at least two more resources.  Stay tuned.  I am
>>>> doing the charge here to enforce the idea of "creator is deemed owner"
>>>> at a place where the code is shared by all (the init function.)
>>>>
>>>>>> +     while (i <= max_minor && limits != NULL) {
>>>>>> +             sval =  strsep(&limits, "\n");
>>>>>> +             rc = kstrtoll(sval, 0, &val);
>>>>> Input should be "KEY VALUE", so KEY will determine device to apply this to.
>>>>> Also, per cgroups-v2 documentation of limits, I believe need to parse and handle the special "max" input value.
>>>>>
>>>>> parse_resources() in rdma controller is example for both of above.
>>>> Please see my previous reply for the rationale of my hope to not need
>>>> a key.  I can certainly add handling of "max" and "default".
>>>>
>>>>
>>>>>> +void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
>>>>>> +             size_t size)
>>>>> Shouldn't this return an error and be implemented with same semantics as the
>>>>> try_charge() functions of other controllers?
>>>>> Below will allow stats_total_allocated to overrun limits_total_allocated.
>>>> This is because I am charging the buffer at the init of the buffer
>>>> which does not fail so the "try" (drmcgrp_bo_can_allocate) is separate
>>>> and placed earlier and nearer other condition where gem object
>>>> allocation may fail.  In other words, there are multiple possibilities
>>>> for which gem allocation may fail (cgroup limit being one of them) and
>>>> satisfying cgroup limit does not mean a charge is needed.  I can
>>>> certainly combine the two functions to have an additional try_charge
>>>> semantic as well if that is really needed.
>>>>
>>>> Regards,
>>>> Kenny
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@xxxxxxxxxxxxxxxxxxxxx
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel