Re: [Intel-gfx] [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices

Matt Roper <matthew.d.roper@xxxxxxxxx> · Mon, 3 Dec 2018 10:58:57 -0800

On Mon, Dec 03, 2018 at 06:46:01AM +0000, Ho, Kenny wrote:
> Hey Matt,
> 
> On Fri, Nov 30, 2018 at 5:22 PM Matt Roper <matthew.d.roper@xxxxxxxxx> wrote:
> > I think Joonas is describing something closer in
> > design to the cgroup-v2 "cpu" controller, which partitions the general
> > time/usage allocated to via cgroup; afaiu, "cpu" doesn't really care
> > which specific core the tasks run on, just the relative weights that
> > determine how much time they get to run on any of the cores.
> 
> Depending on the level of optimization one wants to do, I think people
> care about which cpu core a task runs on.  Modern processors are no
> longer a monolithic 'thing'.  At least for AMD, there are multiple
> cpus on a core complex (CCX), multiple CCX on a die, and multiple dies
> on a processor.  A task running on cpu 0 and cpu 1 on die 0 will
> behave very differently from a task running on core 0s on die 0 and
> die 1 on the same socket.
> (https://en.wikichip.org/wiki/amd/microarchitectures/zen#Die-die_memory_latencies)
> 
> It's not just an AMD thing either.  Here is an open issue on Intel's architecture:
> https://github.com/kubernetes/kubernetes/issues/67355
> 
> and a proposed solution using cpu affinity
> https://github.com/kubernetes/community/blob/630acc487c80e4981a232cdd8400eb8207119788/keps/sig-node/0030-qos-class-cpu-affinity.md#proposal
> (by one of your colleagues.)

Right, I didn't mean to imply that the use case wasn't valid, I was just
referring to how I believe the cgroup-v2 'cpu' controller (i.e.,
cpu_cgrp_subsys) currently behaves, as a contrast to the behavior of the
cgroup-v1 'cpuset' controller.  I can definitely understand your
motivation for wanting something along the lines of a "gpuset"
controller, but as far as I know, that just isn't something that's
possible to implement on a lot of GPU's.

> 
> The time-based sharing below is also something we are thinking about,
> but it's personally not as exciting as the resource-based sharing for
> me because the time-share use case has already been addressed by our
> SRIOV/virtualization products.  We can potentially have different
> level of time sharing using cgroup though (in addition to SRIOV),
> potentially trading efficiency against isolation.  That said, I think
> the time-based approach maybe orthogonal to the resource-based
> approach (orthogonal in the sense that both are needed depending on
> the usage.)

Makes sense.

Matt

> 
> Regards,
> Kenny
> 
> 
> > It sounds like with your hardware, your kernel driver is able to specify
> > exactly which subset of GPU EU's a specific GPU context winds up running
> > on.  However I think there are a lot of platforms that don't allow that
> > kind of low-level control.  E.g., I don't think we can do that on Intel
> > hardware; we have a handful of high-level GPU engines that we can submit
> > different types of batchbuffers to (render, blitter, media, etc.).  What
> > we can do is use GPU preemption to limit how much time specific GPU
> > contexts get to run on the render engine before the engine is reclaimed
> > for use by a different context.
> >
> > Using a %gputime approach like Joonas is suggesting could be handled in
> > a driver by reserving specific subsets of EU's on hardware like yours
> > that's capable of doing that, whereas it could be mostly handled on
> > other types of hardware via GPU engine preemption.
> >
> > I think either approach "gpu_euset" or "%gputime" should map well to a
> > cgroup controller implementation.  Granted, neither one solves the
> > specific use case I was working on earlier this year where we need
> > unfair (starvation-okay) scheduling that will run contexts strictly
> > according to priority (i.e., lower priority contexts will never run at
> > all unless all higher priority contexts have completed all of their
> > submitted work), but that's a pretty specialized use case that we'll
> > probably need to handle in a different manner anyway.
> >
> >
> > Matt
> >
> >
> > > Regards,
> > > Kennny
> > >
> > >
> > > > > > That combined with the "GPU memory usable" property should be a good
> > > > > > starting point to start subdividing the GPU resources for multiple
> > > > > > users.
> > > > > >
> > > > > > Regards, Joonas
> > > > > >
> > > > > > >
> > > > > > > Your feedback is highly appreciated.
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Harish
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> on behalf of Tejun Heo <tj@xxxxxxxxxx>
> > > > > > > Sent: Tuesday, November 20, 2018 5:30 PM
> > > > > > > To: Ho, Kenny
> > > > > > > Cc: cgroups@xxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; y2kenny@xxxxxxxxx; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx
> > > > > > > Subject: Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices
> > > > > > >
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > On Tue, Nov 20, 2018 at 10:21:14PM +0000, Ho, Kenny wrote:
> > > > > > > > By this reply, are you suggesting that vendor specific resources
> > > > > > > > will never be acceptable to be managed under cgroup?  Let say a user
> > > > > > >
> > > > > > > I wouldn't say never but whatever which gets included as a cgroup
> > > > > > > controller should have clearly defined resource abstractions and the
> > > > > > > control schemes around them including support for delegation.  AFAICS,
> > > > > > > gpu side still seems to have a long way to go (and it's not clear
> > > > > > > whether that's somewhere it will or needs to end up).
> > > > > > >
> > > > > > > > want to have similar functionality as what cgroup is offering but to
> > > > > > > > manage vendor specific resources, what would you suggest as a
> > > > > > > > solution?  When you say keeping vendor specific resource regulation
> > > > > > > > inside drm or specific drivers, do you mean we should replicate the
> > > > > > > > cgroup infrastructure there or do you mean either drm or specific
> > > > > > > > driver should query existing hierarchy (such as device or perhaps
> > > > > > > > cpu) for the process organization information?
> > > > > > > >
> > > > > > > > To put the questions in more concrete terms, let say a user wants to
> > > > > > > > expose certain part of a gpu to a particular cgroup similar to the
> > > > > > > > way selective cpu cores are exposed to a cgroup via cpuset, how
> > > > > > > > should we go about enabling such functionality?
> > > > > > >
> > > > > > > Do what the intel driver or bpf is doing?  It's not difficult to hook
> > > > > > > into cgroup for identification purposes.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > --
> > > > > > > tejun
> > > > > > > _______________________________________________
> > > > > > > amd-gfx mailing list
> > > > > > > amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> > > > > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > > > > > >
> > > > > > >
> > > > > > > amd-gfx Info Page - freedesktop.org
> > > > > > > lists.freedesktop.org
> > > > > > > To see the collection of prior postings to the list, visit the amd-gfx Archives.. Using amd-gfx: To post a message to all the list members, send email to amd-gfx@xxxxxxxxxxxxxxxxxxxxx. You can subscribe to the list, or change your existing subscription, in the sections below.
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Intel-gfx mailing list
> > > > > > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> > > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> >
> > --
> > Matt Roper
> > Graphics Software Engineer
> > IoTG Platform Enabling & Development
> > Intel Corporation
> > (916) 356-2795

-- 
Matt Roper
Graphics Software Engineer
IoTG Platform Enabling & Development
Intel Corporation
(916) 356-2795
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel