On Tue, Sep 3, 2019 at 8:50 PM Tejun Heo <tj@xxxxxxxxxx> wrote: > > Hello, Daniel. > > On Tue, Sep 03, 2019 at 09:55:50AM +0200, Daniel Vetter wrote: > > > * While breaking up and applying control to different types of > > > internal objects may seem attractive to folks who work day in and > > > day out with the subsystem, they aren't all that useful to users and > > > the siloed controls are likely to make the whole mechanism a lot > > > less useful. We had the same problem with cgroup1 memcg - putting > > > control of different uses of memory under separate knobs. It made > > > the whole thing pretty useless. e.g. if you constrain all knobs > > > tight enough to control the overall usage, overall utilization > > > suffers, but if you don't, you really don't have control over actual > > > usage. For memcg, what has to be allocated and controlled is > > > physical memory, no matter how they're used. It's not like you can > > > go buy more "socket" memory. At least from the looks of it, I'm > > > afraid gpu controller is repeating the same mistakes. > > > > We do have quite a pile of different memories and ranges, so I don't > > thinkt we're doing the same mistake here. But it is maybe a bit too > > I see. One thing which caught my eyes was the system memory control. > Shouldn't that be controlled by memcg? Is there something special > about system memory used by gpus? I think system memory separate from vram makes sense. For one, vram is like 10x+ faster than system memory, so we definitely want to have good control on that. But maybe we only want one vram bucket overall for the entire system? The trouble with system memory is that gpu tasks pin that memory to prep execution. There's two solutions: - i915 has a shrinker. Lots (and I really mean lots) of pain with direct reclaim recursion, which often means we can't free memory, and we're angering the oom killer a lot. Plus it introduces real bad latency spikes everywhere (gpu workloads are occasionally really slow, think "worse than pageout to spinning rust" to get memory freed). - ttm just has a global limit, set to 50% of system memory. I do think a global system memory limit to tame the shrinker, without the ttm approach of possible just wasting half your memory, could be useful. > > complicated, and exposes stuff that most users really don't care about. > > Could be from me not knowing much about gpus but definitely looks too > complex to me. I don't see how users would be able to alloate, vram, > system memory and GART with reasonable accuracy. memcg on cgroup2 > deals with just single number and that's already plenty challenging. Yeah, especially wrt GART and some of the other more specialized things I don't think there's any modern gpu were you can actually run out of that stuff. At least not before you run out of every other kind of memory (GART is just a remapping table to make system memory visible to the gpu). I'm also not sure of the bw limits, given all the fun we have on the block io cgroups side. Aside from that the current bw limit only controls the bw the kernel uses, userspace can submit unlimited amounts of copying commands that use the same pcie links directly to the gpu, bypassing this cg knob. Also, controlling execution time for gpus is very tricky, since they work a lot more like a block io device or maybe a network controller with packet scheduling, than a cpu. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch