Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices

"Ho, Kenny" <Kenny.Ho@xxxxxxx> · Mon, 3 Dec 2018 06:46:01 +0000

Hey Matt,

On Fri, Nov 30, 2018 at 5:22 PM Matt Roper <matthew.d.roper@xxxxxxxxx> wrote:
> I think Joonas is describing something closer in
> design to the cgroup-v2 "cpu" controller, which partitions the general
> time/usage allocated to via cgroup; afaiu, "cpu" doesn't really care
> which specific core the tasks run on, just the relative weights that
> determine how much time they get to run on any of the cores.

Depending on the level of optimization one wants to do, I think people care about which cpu core a task runs on.  Modern processors are no longer a monolithic 'thing'.  At least for AMD, there are multiple cpus on a core complex (CCX), multiple CCX on a die, and multiple dies on a processor.  A task running on cpu 0 and cpu 1 on die 0 will behave very differently from a task running on core 0s on die 0 and die 1 on the same socket.  (https://en.wikichip.org/wiki/amd/microarchitectures/zen#Die-die_memory_latencies)

It's not just an AMD thing either.  Here is an open issue on Intel's architecture:
https://github.com/kubernetes/kubernetes/issues/67355

and a proposed solution using cpu affinity https://github.com/kubernetes/community/blob/630acc487c80e4981a232cdd8400eb8207119788/keps/sig-node/0030-qos-class-cpu-affinity.md#proposal (by one of your colleagues.)

The time-based sharing below is also something we are thinking about, but it's personally not as exciting as the resource-based sharing for me because the time-share use case has already been addressed by our SRIOV/virtualization products.  We can potentially have different level of time sharing using cgroup though (in addition to SRIOV), potentially trading efficiency against isolation.  That said, I think the time-based approach maybe orthogonal to the resource-based approach (orthogonal in the sense that both are needed depending on the usage.)

Regards,
Kenny

> It sounds like with your hardware, your kernel driver is able to specify
> exactly which subset of GPU EU's a specific GPU context winds up running
> on.  However I think there are a lot of platforms that don't allow that
> kind of low-level control.  E.g., I don't think we can do that on Intel
> hardware; we have a handful of high-level GPU engines that we can submit
> different types of batchbuffers to (render, blitter, media, etc.).  What
> we can do is use GPU preemption to limit how much time specific GPU
> contexts get to run on the render engine before the engine is reclaimed
> for use by a different context.
>
> Using a %gputime approach like Joonas is suggesting could be handled in
> a driver by reserving specific subsets of EU's on hardware like yours
> that's capable of doing that, whereas it could be mostly handled on
> other types of hardware via GPU engine preemption.
>
> I think either approach "gpu_euset" or "%gputime" should map well to a
> cgroup controller implementation.  Granted, neither one solves the
> specific use case I was working on earlier this year where we need
> unfair (starvation-okay) scheduling that will run contexts strictly
> according to priority (i.e., lower priority contexts will never run at
> all unless all higher priority contexts have completed all of their
> submitted work), but that's a pretty specialized use case that we'll
> probably need to handle in a different manner anyway.
>
>
> Matt
>
>
> > Regards,
> > Kennny
> >
> >
> > > > > That combined with the "GPU memory usable" property should be a good
> > > > > starting point to start subdividing the GPU resources for multiple
> > > > > users.
> > > > >
> > > > > Regards, Joonas
> > > > >
> > > > > >
> > > > > > Your feedback is highly appreciated.
> > > > > >
> > > > > > Best Regards,
> > > > > > Harish
> > > > > >
> > > > > >
> > > > > >
> > > > > > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> on behalf of Tejun Heo <tj@xxxxxxxxxx>
> > > > > > Sent: Tuesday, November 20, 2018 5:30 PM
> > > > > > To: Ho, Kenny
> > > > > > Cc: cgroups@xxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; y2kenny@xxxxxxxxx; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx
> > > > > > Subject: Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices
> > > > > >
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 10:21:14PM +0000, Ho, Kenny wrote:
> > > > > > > By this reply, are you suggesting that vendor specific resources
> > > > > > > will never be acceptable to be managed under cgroup?  Let say a user
> > > > > >
> > > > > > I wouldn't say never but whatever which gets included as a cgroup
> > > > > > controller should have clearly defined resource abstractions and the
> > > > > > control schemes around them including support for delegation.  AFAICS,
> > > > > > gpu side still seems to have a long way to go (and it's not clear
> > > > > > whether that's somewhere it will or needs to end up).
> > > > > >
> > > > > > > want to have similar functionality as what cgroup is offering but to
> > > > > > > manage vendor specific resources, what would you suggest as a
> > > > > > > solution?  When you say keeping vendor specific resource regulation
> > > > > > > inside drm or specific drivers, do you mean we should replicate the
> > > > > > > cgroup infrastructure there or do you mean either drm or specific
> > > > > > > driver should query existing hierarchy (such as device or perhaps
> > > > > > > cpu) for the process organization information?
> > > > > > >
> > > > > > > To put the questions in more concrete terms, let say a user wants to
> > > > > > > expose certain part of a gpu to a particular cgroup similar to the
> > > > > > > way selective cpu cores are exposed to a cgroup via cpuset, how
> > > > > > > should we go about enabling such functionality?
> > > > > >
> > > > > > Do what the intel driver or bpf is doing?  It's not difficult to hook
> > > > > > into cgroup for identification purposes.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > --
> > > > > > tejun
> > > > > > _______________________________________________
> > > > > > amd-gfx mailing list
> > > > > > amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> > > > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > > > > >
> > > > > >
> > > > > > amd-gfx Info Page - freedesktop.org
> > > > > > lists.freedesktop.org
> > > > > > To see the collection of prior postings to the list, visit the amd-gfx Archives.. Using amd-gfx: To post a message to all the list members, send email to amd-gfx@xxxxxxxxxxxxxxxxxxxxx. You can subscribe to the list, or change your existing subscription, in the sections below.
> > > > > >
> > > > > > _______________________________________________
> > > > > > Intel-gfx mailing list
> > > > > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Matt Roper
> Graphics Software Engineer
> IoTG Platform Enabling & Development
> Intel Corporation
> (916) 356-2795
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx