On Mon, Dec 03, 2018 at 06:46:01AM +0000, Ho, Kenny wrote: > Hey Matt, > > On Fri, Nov 30, 2018 at 5:22 PM Matt Roper <matthew.d.roper@xxxxxxxxx> wrote: > > I think Joonas is describing something closer in > > design to the cgroup-v2 "cpu" controller, which partitions the general > > time/usage allocated to via cgroup; afaiu, "cpu" doesn't really care > > which specific core the tasks run on, just the relative weights that > > determine how much time they get to run on any of the cores. > > Depending on the level of optimization one wants to do, I think people > care about which cpu core a task runs on. Modern processors are no > longer a monolithic 'thing'. At least for AMD, there are multiple > cpus on a core complex (CCX), multiple CCX on a die, and multiple dies > on a processor. A task running on cpu 0 and cpu 1 on die 0 will > behave very differently from a task running on core 0s on die 0 and > die 1 on the same socket. > (https://en.wikichip.org/wiki/amd/microarchitectures/zen#Die-die_memory_latencies) > > It's not just an AMD thing either. Here is an open issue on Intel's architecture: > https://github.com/kubernetes/kubernetes/issues/67355 > > and a proposed solution using cpu affinity > https://github.com/kubernetes/community/blob/630acc487c80e4981a232cdd8400eb8207119788/keps/sig-node/0030-qos-class-cpu-affinity.md#proposal > (by one of your colleagues.) Right, I didn't mean to imply that the use case wasn't valid, I was just referring to how I believe the cgroup-v2 'cpu' controller (i.e., cpu_cgrp_subsys) currently behaves, as a contrast to the behavior of the cgroup-v1 'cpuset' controller. I can definitely understand your motivation for wanting something along the lines of a "gpuset" controller, but as far as I know, that just isn't something that's possible to implement on a lot of GPU's. > > The time-based sharing below is also something we are thinking about, > but it's personally not as exciting as the resource-based sharing for > me because the time-share use case has already been addressed by our > SRIOV/virtualization products. We can potentially have different > level of time sharing using cgroup though (in addition to SRIOV), > potentially trading efficiency against isolation. That said, I think > the time-based approach maybe orthogonal to the resource-based > approach (orthogonal in the sense that both are needed depending on > the usage.) Makes sense. Matt > > Regards, > Kenny > > > > It sounds like with your hardware, your kernel driver is able to specify > > exactly which subset of GPU EU's a specific GPU context winds up running > > on. However I think there are a lot of platforms that don't allow that > > kind of low-level control. E.g., I don't think we can do that on Intel > > hardware; we have a handful of high-level GPU engines that we can submit > > different types of batchbuffers to (render, blitter, media, etc.). What > > we can do is use GPU preemption to limit how much time specific GPU > > contexts get to run on the render engine before the engine is reclaimed > > for use by a different context. > > > > Using a %gputime approach like Joonas is suggesting could be handled in > > a driver by reserving specific subsets of EU's on hardware like yours > > that's capable of doing that, whereas it could be mostly handled on > > other types of hardware via GPU engine preemption. > > > > I think either approach "gpu_euset" or "%gputime" should map well to a > > cgroup controller implementation. Granted, neither one solves the > > specific use case I was working on earlier this year where we need > > unfair (starvation-okay) scheduling that will run contexts strictly > > according to priority (i.e., lower priority contexts will never run at > > all unless all higher priority contexts have completed all of their > > submitted work), but that's a pretty specialized use case that we'll > > probably need to handle in a different manner anyway. > > > > > > Matt > > > > > > > Regards, > > > Kennny > > > > > > > > > > > > That combined with the "GPU memory usable" property should be a good > > > > > > starting point to start subdividing the GPU resources for multiple > > > > > > users. > > > > > > > > > > > > Regards, Joonas > > > > > > > > > > > > > > > > > > > > Your feedback is highly appreciated. > > > > > > > > > > > > > > Best Regards, > > > > > > > Harish > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> on behalf of Tejun Heo <tj@xxxxxxxxxx> > > > > > > > Sent: Tuesday, November 20, 2018 5:30 PM > > > > > > > To: Ho, Kenny > > > > > > > Cc: cgroups@xxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; y2kenny@xxxxxxxxx; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx > > > > > > > Subject: Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > On Tue, Nov 20, 2018 at 10:21:14PM +0000, Ho, Kenny wrote: > > > > > > > > By this reply, are you suggesting that vendor specific resources > > > > > > > > will never be acceptable to be managed under cgroup? Let say a user > > > > > > > > > > > > > > I wouldn't say never but whatever which gets included as a cgroup > > > > > > > controller should have clearly defined resource abstractions and the > > > > > > > control schemes around them including support for delegation. AFAICS, > > > > > > > gpu side still seems to have a long way to go (and it's not clear > > > > > > > whether that's somewhere it will or needs to end up). > > > > > > > > > > > > > > > want to have similar functionality as what cgroup is offering but to > > > > > > > > manage vendor specific resources, what would you suggest as a > > > > > > > > solution? When you say keeping vendor specific resource regulation > > > > > > > > inside drm or specific drivers, do you mean we should replicate the > > > > > > > > cgroup infrastructure there or do you mean either drm or specific > > > > > > > > driver should query existing hierarchy (such as device or perhaps > > > > > > > > cpu) for the process organization information? > > > > > > > > > > > > > > > > To put the questions in more concrete terms, let say a user wants to > > > > > > > > expose certain part of a gpu to a particular cgroup similar to the > > > > > > > > way selective cpu cores are exposed to a cgroup via cpuset, how > > > > > > > > should we go about enabling such functionality? > > > > > > > > > > > > > > Do what the intel driver or bpf is doing? It's not difficult to hook > > > > > > > into cgroup for identification purposes. > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > -- > > > > > > > tejun > > > > > > > _______________________________________________ > > > > > > > amd-gfx mailing list > > > > > > > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > > > > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > > > > > > > > > > > > > > > > > > > > amd-gfx Info Page - freedesktop.org > > > > > > > lists.freedesktop.org > > > > > > > To see the collection of prior postings to the list, visit the amd-gfx Archives.. Using amd-gfx: To post a message to all the list members, send email to amd-gfx@xxxxxxxxxxxxxxxxxxxxx. You can subscribe to the list, or change your existing subscription, in the sections below. > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Intel-gfx mailing list > > > > > > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > -- > > Matt Roper > > Graphics Software Engineer > > IoTG Platform Enabling & Development > > Intel Corporation > > (916) 356-2795 -- Matt Roper Graphics Software Engineer IoTG Platform Enabling & Development Intel Corporation (916) 356-2795 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx