Adding Kenny, who recently started looking into cgroup support from the AMD GPU computing side. Regards, Felix On 2018-04-05 11:48 AM, Matt Roper wrote: > Just realized Joonas had some other comments down at the bottom that I > missed originally... > > On Thu, Apr 05, 2018 at 08:06:23AM -0700, Matt Roper wrote: >> On Thu, Apr 05, 2018 at 07:49:44AM -0700, Matt Roper wrote: >>> On Thu, Apr 05, 2018 at 05:15:13PM +0300, Joonas Lahtinen wrote: >>>> + Some more Cc's based on IRC discussion >>>> >>>> Quoting Joonas Lahtinen (2018-04-05 16:46:51) >>>>> + Dave for commenting from DRM subsystem perspective. I strongly believe >>>>> there would be benefit from agreeing on some foundation of DRM subsystem >>>>> level program GPU niceness [-20,19] and memory limit [0,N] pages. >>> +Chris Wilson >> +more Cc's based on IRC discussion. >> >> >> Matt >> >>> If we ignore backward compatibility and ABI issues for now and assume >>> all drivers can move to [-20, 19] for application-accessible priority >>> ranges (i.e., self opt-in via existing context parameter ioctls and >>> such), I think we still need a much larger range for the priority offset >>> assigned via cgroup. One of the goals with the cgroup work is to give >>> the system integrator the ability to define classes of applications with >>> non-overlapping ranges (i.e., in some systems an app in the "very >>> important" cgroup that self-lowers itself as far as possible should >>> still be prioritized higher than an app in the "best effort" cgroup that >>> self-boosts itself as far as possible). Chris Wilson and I discussed >>> that a bit on this thread if you want to see the context: >>> >>> https://lists.freedesktop.org/archives/intel-gfx/2018-March/158204.html >>> >>> That discussion was based on i915's current application-accessible range >>> of [-1023,1023]. If we shift to a much smaller [-20,19] range for >>> applications to use directly, then we might want to allow cgroup offset >>> to be something like [-1000,1000] to ensure the system integrator has >>> enough flexibility? >>> >>> Also note that "display boost" (i.e., a priority boost for contexts that >>> are blocking a flip) may vary depending on system setup, so setting >>> being able to define the display boost's effective priority via cgroup >>> is important as well. >>> >>> >>> Matt >>> >>> >>> >>>>> Quoting Matt Roper (2018-03-30 03:43:13) >>>>>> On Mon, Mar 26, 2018 at 10:30:23AM +0300, Joonas Lahtinen wrote: >>>>>>> Quoting Matt Roper (2018-03-23 17:46:16) >>>>>>>> On Fri, Mar 23, 2018 at 02:15:38PM +0200, Joonas Lahtinen wrote: >>>>>>>>> Quoting Matt Roper (2018-03-17 02:08:57) >>>>>>>>>> This is the fourth iteration of the work previously posted here: >>>>>>>>>> (v1) https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html >>>>>>>>>> (v2) https://www.mail-archive.com/dri-devel@xxxxxxxxxxxxxxxxxxxxx/msg208170.html >>>>>>>>>> (v3) https://lists.freedesktop.org/archives/intel-gfx/2018-March/157928.html >>>>>>>>>> >>>>>>>>>> The high level goal of this work is to allow non-cgroup-controller parts >>>>>>>>>> of the kernel (e.g., device drivers) to register their own private >>>>>>>>>> policy data for specific cgroups. That mechanism is then made use of in >>>>>>>>>> the i915 graphics driver to allow GPU priority to be assigned according >>>>>>>>>> to the cgroup membership of the owning process. Please see the v1 cover >>>>>>>>>> letter linked above for a more in-depth explanation and justification. >>>>>>>>> Hi Matt, >>>>>>>>> >>>>>>>>> For cross-subsystem changes such as this, it makes sense to Cc all >>>>>>>>> relevant maintainers, especially if there have been previous comments to >>>>>>>>> earlier revisions. >>>>>>>>> >>>>>>>>> Please, do include and keep a reference to the userspace portion of the >>>>>>>>> changes when you suggest new uAPI to be added. At least I have some trouble >>>>>>>>> trying to track down the relevant interface consumer here. >>>>>>>>> >>>>>>>>> I'm unsure how much sense it makes to commence with detailed i915 review >>>>>>>>> if we will be blocked by lack of userspace after that? I'm assuming >>>>>>>>> you've read through [1] already. >>>>>>>> Hi Joonas, >>>>>>>> >>>>>>>> I've sent the userspace code out a few times, but it looks like I forgot >>>>>>>> to include a copy with v4/v4.5. Here's the version I provided with v3: >>>>>>>> https://lists.freedesktop.org/archives/intel-gfx/2018-March/157935.html >>>>>>> Thanks. Keeping that in the relevant commit message of the patch that >>>>>>> introduces the new uAPI will make it harder to forget and easiest for >>>>>>> git blame, too. >>>>>>> >>>>>>>> Usually we don't consider things like i-g-t to be sufficient userspace >>>>>>>> consumers because we need a real-world consumer rather than a "toy" >>>>>>>> userspace. However in this case, the i-g-t tool, although very simple, >>>>>>>> is really the only userspace consumer I expect there to ever be. >>>>>>>> Ultimately the true consumer of this cgroups work are bash scripts, sysv >>>>>>>> init scripts, systemd recipes, etc. that just need a very simple tool >>>>>>>> to assign the specific values that make sense on a given system. >>>>>>>> There's no expectation that graphics clients or display servers would >>>>>>>> ever need to make use of these interfaces. >>>>>>> I was under the impression that a bit more generic GPU cgroups support >>>>>>> was receiving a lot of support in the early discussion? A dedicated >>>>>>> intel_cgroup sounds underwhelming, when comparing to idea of "gpu_nice", >>>>>>> for user adoption :) >>>>>> I'm open to moving the cgroup_priv registration/lookup to the DRM core >>>>>> if other drivers are interested in using this mechanism and if we can >>>>>> come to an agreement on a standard priority offset range to support, how >>>>>> display boost should work for all drivers, etc. There might be some >>>>>> challenges mapping a DRM-defined priority range down to a different >>>>>> range that makes sense for individual driver schedulers, especially >>>>>> since some drivers already expose a different priority scheme to >>>>>> userspace via other interfaces like i915 does with GEM context priority. >>>>>> >>>>>> So far I haven't really heard any interest outside the Intel camp, but >>>>>> hopefully other driver teams can speak up if they're for/against this. >>>>>> I don't want to try to artificially standardize this if other drivers >>>>>> want to go a different direction with priority/scheduling that's too >>>>>> different from the current Intel-driven design. >>>>> I don't think there are that many directions to go about GPU context >>>>> priority, considering we have the EGL_IMG_context_priority extension, so >>>>> it'll only be about granularity of the scale. >>>>> >>>>> I would suggest to go with the nice like scale for easy user adoption, >>>>> then just apply that as the N most significant bits. >>>>> >>>>> The contexts could then of course further adjust their priority from what >>>>> is set by the "gpu_nice" application with the remaining bits. >>>>> >>>>> I'm strongly feeling this should be a DRM level "gpu_nice". And the >>>>> binding to cgroups should come through DRM core. If it doesn't, limiting >>>>> the amount of memory used becomes awkward as the allocation is >>>>> centralized to DRM core. > My earliest iterations of this work did add the interfaces to set > cgroup-based GPU priority at the DRM core. I didn't really get the > sense at the time that anyone outside of Intel was interested in the > cgroups work, so I moved the interfaces back into i915 for subequent > updates, but I'm happy to move it back. Note that even in that case > this will still end up as a DRM core ioctl rather than a true cgroups > controller; see below. > >>>>>>> Also, I might not be up-to-date about all things cgroups, but the way >>>>>>> intel_cgroup works, feels bit forced. We create a userspace context just >>>>>>> to communicate with the driver and the IOCTL will still have global >>>>>>> effects. I can't but think that i915 reading from the cgroups subsystem >>>>>>> for the current process would feel more intuitive to me. >>>>>> I think you're referring to the earlier discussion about exposing >>>>>> priority directly via the cgroups filesystem? That would certainly be >>>>>> simpler from a userspace perspective, but it's not the direction that >>>>>> the cgroups maintainer wants to see things go. Adding files directly to >>>>>> the cgroups filesystem is supposed to be something that's reserved for >>>>>> official cgroups controllers. The GPU priority concept we're trying to >>>>>> add here doesn't align with the requirements for creating a controller, >>>>>> so the preferred approach is to create a custom interface (syscall or >>>>>> ioctl) that simply takes a cgroup as a parameter. There's precendent >>>>>> with similar interfaces in areas like BPF (where the bpf() system call >>>>>> can accept a cgroup as a parameter and then perform its own private >>>>>> policy changes as it sees fit). >>>>>> >>>>>> Using a true cgroups controller and exposing settings via the filesystem >>>>>> is likely still the way we'll want to go for some other types of >>>>>> cgroups-based policy in the future (e.g., limiting GPU memory usage); it >>>>>> just isn't the appropriate direction for priority. >>>>> Might be just me but feels bit crazy to be setting GPU memory usage >>>>> through another interface and then doing i915 specific IOCTLs to control >>>>> the priority of that same cgroup. >>>>> >>>>> I don't feel comfortable adding custom cgroups dependent IOCTLs to i915 >>>>> where cgroups is only working as the variable carrier in background. We >>>>> should really just be consuming a variable from cgroups and it should be >>>>> set outside of of the i915 IOCTL interface. >>>>> >>>>> I'm still seeing that we should have a DRM cgroups controller and a DRM >>>>> subsystem wide application to control the priority and memory usage >>>>> to be fed to the drivers. > A cgroups controller could be used for GPU memory management, but not > for priority adjustment the way we're doing it right now. Although > there's some preexisting legacy stuff that doesn't quite follow the > rules, the cgroups subsystem will only accept new cgroup controllers if > they take the entire structure of the cgroups hierarchy into account > (rather than just looking at values at a leaf node) and are based on one > of the four resource distribution models described in the > Documentation/cgroup-v2.txt (weights, limits, protections, allocations). > GPU priority is somewhat close to a "weight" but not quite --- a true > weight wouldn't allow unfair scheduling or intentional starvation which > are very important to the use cases I'm focusing on. > > The cgroups maintainer has suggested that a separate interface (syscall, > ioctl, etc.) be used for items like this that don't meet the criteria > for a full cgroups controller, and there are other examples of this kind > of design elsewhere in the kernel too. > >>>>> If we end up just supporting i915 apps, we could as well use LD_PRELOAD >>>>> wrapper and alter the context priority at creation time for exactly the >>>>> same effect and no extra interfaces to maintain. > I think this is missing the point of using cgroups. cgroups allow > policy to be set independently by a privileged system integrator or > system administrator, in a way that applications themselves don't have > access to. LD_PRELOAD would only have access to the interfaces > available to the app itself and the app could simply undo whatever > setting the LD_PRELOAD tried to use. > > The goal with cgroups isn't to provide a "gpu nice" where you > self-classify yourself, but rather the ability for the person defining > how the overall system works to classify and control the various types > of applications running in a transparent and global way. > >>>>>>> Does the implementation mimic some existing cgroups tool or de-facto way >>>>>>> of doing things in cgroups world? >>>>>> The ioctl approach I took is similar to syscall approach that the BPF >>>>>> guys use to attach BPF programs to a cgroup. I'm not very familiar with >>>>>> BPF or how it gets used from userspace, so I'm not sure whether the >>>>>> interface is intended for one specific tool (like ours is), or whether >>>>>> there's more variety for userspace consumers. >>>>> Is the proposal to set the memory usage from similar interface, or is >>>>> that still not implemented? > GPU memory management is a very different resource model, so I'd expect > memory to be managed by a real cgroups controller. I.e., the interface > would be different. I haven't personally done any work on cgroups-based > memory management; I agree that would be nice to have, but none of the > use cases my team is focused on have that as a requirement. > >>>>> I'm seeing a very close relation between time-slicing GPU time and >>>>> allowed GPU buffer allocations, so having two completely different >>>>> interfaces does just feel very hackish way of implementing this. > If we were time slicing GPU time and giving out fractional shares of > time to each client, that would be a very different situation than we > have today; in that case I agree that a cgroups controller would be the > way to go. While there probably are a lot of cases where time slicing > and fair scheduling would be beneficial (especially stuff like desktop > or mobile settings), I'm mostly focused on embedded use cases that > depend on today's unfair, strict priority scheduling. > > > Matt > >>>>> Regards, Joonas >>>>> >>>>>> >>>>>> Matt >>>>>> >>>>>>> Regards, Joonas >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe cgroups" in >>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> -- >>>>>> Matt Roper >>>>>> Graphics Software Engineer >>>>>> IoTG Platform Enabling & Development >>>>>> Intel Corporation >>>>>> (916) 356-2795 >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe cgroups" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> Matt Roper >>> Graphics Software Engineer >>> IoTG Platform Enabling & Development >>> Intel Corporation >>> (916) 356-2795 >> -- >> Matt Roper >> Graphics Software Engineer >> IoTG Platform Enabling & Development >> Intel Corporation >> (916) 356-2795 -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html