Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 16.05.19 um 09:16 schrieb Koenig, Christian:
Am 16.05.19 um 04:29 schrieb Kenny Ho:
[CAUTION: External Email]

On Wed, May 15, 2019 at 5:26 PM Welty, Brian <brian.welty@xxxxxxxxx> wrote:
On 5/9/2019 2:04 PM, Kenny Ho wrote:
There are four control file types,
stats (ro) - display current measured values for a resource
max (rw) - limits for a resource
default (ro, root cgroup only) - default values for a resource
help (ro, root cgroup only) - help string for a resource

Each file is multi-lined with one entry/line per drm device.
Multi-line is correct for multiple devices, but I believe you need
to use a KEY to denote device for both your set and get routines.
I didn't see your set functions reading a key, or the get functions
printing the key in output.
cgroups-v2 conventions mention using KEY of major:minor, but I think
you can use drm_minor as key?
Given this controller is specific to the drm kernel subsystem which
uses minor to identify drm device,
Wait a second, using the DRM minor is a good idea in the first place.

Well that should have read "is not a good idea"..

Christian.


I have a test system with a Vega10 and a Vega20. Which device gets which
minor is not stable, but rather defined by the scan order of the PCIe bus.

Normally the scan order is always the same, but adding or removing
devices or delaying things just a little bit during init is enough to
change this.

We need something like the Linux sysfs location or similar to have a
stable implementation.

Regards,
Christian.

   I don't see a need to complicate
the interfaces more by having major and a key.  As you can see in the
examples below, the drm device minor corresponds to the line number.
I am not sure how strict cgroup upstream is about the convention but I
am hoping there are flexibility here to allow for what I have
implemented.  There are a couple of other things I have done that is
not described in the convention: 1) inclusion of read-only *.help file
at the root cgroup, 2) use read-only (which I can potentially make rw)
*.default file instead of having a default entries (since the default
can be different for different devices) inside the control files (this
way, the resetting of cgroup values for all the drm devices, can be
done by a simple 'cp'.)

Usage examples:
// set limit for card1 to 1GB
sed -i '2s/.*/1073741824/' /sys/fs/cgroup/<cgroup>/drm.buffer.total.max

// set limit for card0 to 512MB
sed -i '1s/.*/536870912/' /sys/fs/cgroup/<cgroup>/drm.buffer.total.max
   /** @file drm_gem.c
@@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
        obj->handle_count = 0;
        obj->size = size;
        drm_vma_node_reset(&obj->vma_node);
+
+     obj->drmcgrp = get_drmcgrp(current);
+     drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size);
Why do the charging here?
There is no backing store yet for the buffer, so this is really tracking something akin to allowed virtual memory for GEM objects?
Is this really useful for an administrator to control?
Isn't the resource we want to control actually the physical backing store?
That's correct.  This is just the first level of control since the
backing store can be backed by different type of memory.  I am in the
process of adding at least two more resources.  Stay tuned.  I am
doing the charge here to enforce the idea of "creator is deemed owner"
at a place where the code is shared by all (the init function.)

+     while (i <= max_minor && limits != NULL) {
+             sval =  strsep(&limits, "\n");
+             rc = kstrtoll(sval, 0, &val);
Input should be "KEY VALUE", so KEY will determine device to apply this to.
Also, per cgroups-v2 documentation of limits, I believe need to parse and handle the special "max" input value.

parse_resources() in rdma controller is example for both of above.
Please see my previous reply for the rationale of my hope to not need
a key.  I can certainly add handling of "max" and "default".


+void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
+             size_t size)
Shouldn't this return an error and be implemented with same semantics as the
try_charge() functions of other controllers?
Below will allow stats_total_allocated to overrun limits_total_allocated.
This is because I am charging the buffer at the init of the buffer
which does not fail so the "try" (drmcgrp_bo_can_allocate) is separate
and placed earlier and nearer other condition where gem object
allocation may fail.  In other words, there are multiple possibilities
for which gem allocation may fail (cgroup limit being one of them) and
satisfying cgroup limit does not mean a charge is needed.  I can
certainly combine the two functions to have an additional try_charge
semantic as well if that is really needed.

Regards,
Kenny
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux