Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices

Matt Roper <matthew.d.roper@xxxxxxxxx> · Fri, 30 Nov 2018 14:22:28 -0800

On Wed, Nov 28, 2018 at 07:46:06PM +0000, Ho, Kenny wrote:
> 
> On Wed, Nov 28, 2018 at 4:14 AM Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> wrote:
> > So we can only choose the lowest common denominator, right?
> >
> > Any core count out of total core count should translate nicely into a
> > fraction, so what would be the problem with percentage amounts?
> 
> I don't think having an abstracted resource necessarily equate
> 'lowest'.  The issue with percentage is the lack of precision.  If you
> look at cpuset cgroup, you can see the specification can be very
> precise:
> 
> # /bin/echo 1-4,6 > cpuset.cpus -> set cpus list to cpus 1,2,3,4,6
> (https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt)
> 
> The driver can translate something like this to core count and then to
> percentage and handle accordingly while the reverse is not possible.
> (You can't tell which set of CUs/EUs a user want from a percentage
> request.)  It's also not clear to me, from
> user/application/admin/resource management perspective, how the base
> core counts of a GPU is relevant to the workload (since percentage is
> a 'relative' quantity.)  For example, let say a workload wants to use
> 256 'cores', does it matter if that workload is put on a GPU with 1024
> cores or a GPU with 4096 cores total?
> 
> I am not dismissing the possible need for percentage.  I just think
> there should be a way to accommodate more than just the 'lowest'. 
> 

As you noted, your proposal is similar to the cgroup-v1 "cpuset"
controller, which is sort of a way of partitioning your underlying
hardware resources; I think Joonas is describing something closer in
design to the cgroup-v2 "cpu" controller, which partitions the general
time/usage allocated to via cgroup; afaiu, "cpu" doesn't really care
which specific core the tasks run on, just the relative weights that
determine how much time they get to run on any of the cores.

It sounds like with your hardware, your kernel driver is able to specify
exactly which subset of GPU EU's a specific GPU context winds up running
on.  However I think there are a lot of platforms that don't allow that
kind of low-level control.  E.g., I don't think we can do that on Intel
hardware; we have a handful of high-level GPU engines that we can submit
different types of batchbuffers to (render, blitter, media, etc.).  What
we can do is use GPU preemption to limit how much time specific GPU
contexts get to run on the render engine before the engine is reclaimed
for use by a different context.

Using a %gputime approach like Joonas is suggesting could be handled in
a driver by reserving specific subsets of EU's on hardware like yours
that's capable of doing that, whereas it could be mostly handled on
other types of hardware via GPU engine preemption.

I think either approach "gpu_euset" or "%gputime" should map well to a
cgroup controller implementation.  Granted, neither one solves the
specific use case I was working on earlier this year where we need
unfair (starvation-okay) scheduling that will run contexts strictly
according to priority (i.e., lower priority contexts will never run at
all unless all higher priority contexts have completed all of their
submitted work), but that's a pretty specialized use case that we'll
probably need to handle in a different manner anyway.

Matt

> Regards,
> Kennny
> 
> 
> > > > That combined with the "GPU memory usable" property should be a good
> > > > starting point to start subdividing the GPU resources for multiple
> > > > users.
> > > >
> > > > Regards, Joonas
> > > >
> > > > >
> > > > > Your feedback is highly appreciated.
> > > > >
> > > > > Best Regards,
> > > > > Harish
> > > > >
> > > > >
> > > > >
> > > > > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> on behalf of Tejun Heo <tj@xxxxxxxxxx>
> > > > > Sent: Tuesday, November 20, 2018 5:30 PM
> > > > > To: Ho, Kenny
> > > > > Cc: cgroups@xxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; y2kenny@xxxxxxxxx; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx
> > > > > Subject: Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > On Tue, Nov 20, 2018 at 10:21:14PM +0000, Ho, Kenny wrote:
> > > > > > By this reply, are you suggesting that vendor specific resources
> > > > > > will never be acceptable to be managed under cgroup?  Let say a user
> > > > >
> > > > > I wouldn't say never but whatever which gets included as a cgroup
> > > > > controller should have clearly defined resource abstractions and the
> > > > > control schemes around them including support for delegation.  AFAICS,
> > > > > gpu side still seems to have a long way to go (and it's not clear
> > > > > whether that's somewhere it will or needs to end up).
> > > > >
> > > > > > want to have similar functionality as what cgroup is offering but to
> > > > > > manage vendor specific resources, what would you suggest as a
> > > > > > solution?  When you say keeping vendor specific resource regulation
> > > > > > inside drm or specific drivers, do you mean we should replicate the
> > > > > > cgroup infrastructure there or do you mean either drm or specific
> > > > > > driver should query existing hierarchy (such as device or perhaps
> > > > > > cpu) for the process organization information?
> > > > > >
> > > > > > To put the questions in more concrete terms, let say a user wants to
> > > > > > expose certain part of a gpu to a particular cgroup similar to the
> > > > > > way selective cpu cores are exposed to a cgroup via cpuset, how
> > > > > > should we go about enabling such functionality?
> > > > >
> > > > > Do what the intel driver or bpf is doing?  It's not difficult to hook
> > > > > into cgroup for identification purposes.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > --
> > > > > tejun
> > > > > _______________________________________________
> > > > > amd-gfx mailing list
> > > > > amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> > > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > > > >
> > > > >
> > > > > amd-gfx Info Page - freedesktop.org
> > > > > lists.freedesktop.org
> > > > > To see the collection of prior postings to the list, visit the amd-gfx Archives.. Using amd-gfx: To post a message to all the list members, send email to amd-gfx@xxxxxxxxxxxxxxxxxxxxx. You can subscribe to the list, or change your existing subscription, in the sections below.
> > > > >
> > > > > _______________________________________________
> > > > > Intel-gfx mailing list
> > > > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Matt Roper
Graphics Software Engineer
IoTG Platform Enabling & Development
Intel Corporation
(916) 356-2795
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx