On Fri, Sep 6, 2019 at 5:23 PM Tejun Heo <tj@xxxxxxxxxx> wrote: > > Hello, Daniel. > > On Tue, Sep 03, 2019 at 09:48:22PM +0200, Daniel Vetter wrote: > > I think system memory separate from vram makes sense. For one, vram is > > like 10x+ faster than system memory, so we definitely want to have > > good control on that. But maybe we only want one vram bucket overall > > for the entire system? > > > > The trouble with system memory is that gpu tasks pin that memory to > > prep execution. There's two solutions: > > - i915 has a shrinker. Lots (and I really mean lots) of pain with > > direct reclaim recursion, which often means we can't free memory, and > > we're angering the oom killer a lot. Plus it introduces real bad > > latency spikes everywhere (gpu workloads are occasionally really slow, > > think "worse than pageout to spinning rust" to get memory freed). > > - ttm just has a global limit, set to 50% of system memory. > > > > I do think a global system memory limit to tame the shrinker, without > > the ttm approach of possible just wasting half your memory, could be > > useful. > > Hmm... what'd be the fundamental difference from slab or socket memory > which are handled through memcg? Is system memory used by GPUs have > further global restrictions in addition to the amount of physical > memory used? Sometimes, but that would be specific resources (kinda like vram), e.g. CMA regions used by a gpu. But probably not something you'll run in a datacenter and want cgroups for ... I guess we could try to integrate with the memcg group controller. One trouble is that aside from i915 most gpu drivers do not really have a full shrinker, so not sure how that would all integrate. The overall gpu memory controller would still be outside of memcg I think, since that would include swapped-out gpu objects, and stuff in special memory regions like vram. > > I'm also not sure of the bw limits, given all the fun we have on the > > block io cgroups side. Aside from that the current bw limit only > > controls the bw the kernel uses, userspace can submit unlimited > > amounts of copying commands that use the same pcie links directly to > > the gpu, bypassing this cg knob. Also, controlling execution time for > > gpus is very tricky, since they work a lot more like a block io device > > or maybe a network controller with packet scheduling, than a cpu. > > At the system level, it just gets folded into cpu time, which isn't > perfect but is usually a good enough approximation of compute related > dynamic resources. Can gpu do someting similar or at least start with > that? So generally there's a pile of engines, often of different type (e.g. amd hw has an entire pile of copy engines), with some ill-defined sharing charateristics for some (often compute/render engines use the same shader cores underneath), kinda like hyperthreading. So at that detail it's all extremely hw specific, and probably too hard to control in a useful way for users. And I'm not sure we can really do a reasonable knob for overall gpu usage, e.g. if we include all the copy engines, but the workloads are only running on compute engines, then you might only get 10% overall utilization by engine-time. While the shaders (which is most of the chip area/power consumption) are actually at 100%. On top, with many userspace apis those engines are an internal implementation detail of a more abstract gpu device (e.g. opengl), but with others, this is all fully exposed (like vulkan). Plus the kernel needs to use at least copy engines for vram management itself, and you really can't take that away. Although Kenny here has some proposal for a separate cgroup resource just for that. I just think it's all a bit too ill-defined, and we might be better off nailing the memory side first and get some real world experience on this stuff. For context, there's not even a cross-driver standard for how priorities are handled, that's all driver-specific interfaces. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel