On Tue, Nov 3, 2020 at 12:43 AM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > On Mon, Nov 2, 2020 at 9:39 PM Kenny Ho <y2kenny@xxxxxxxxx> wrote: > pls don't top post. My apology. > > Cgroup awareness is desired because the intent > > is to use this for resource management as well (potentially along with > > other cgroup controlled resources.) I will dig into bpf_lsm and learn > > more about it. > > Also consider that bpf_lsm hooks have a way to get cgroup-id without > being explicitly scoped. So the bpf program can be made cgroup aware. > It's just not as convenient as attaching a prog to cgroup+hook at once. > For prototyping the existing bpf_lsm facility should be enough. > So please try to follow this route and please share more details about > the use case. Ok. I will take a look and see if that is sufficient. My understanding of bpf-cgroup is that it not only makes attaching prog to cgroup easier but it also facilitates hierarchical calling of attached progs which might be useful if users wants to manage gpu resources with bpf cgroup along with other cgroup resources (like cpu/mem/io, etc.) About the use case. The high level motivation here is to provide the ability to subdivide/share a GPU via cgroups/containers in a way that is similar to other resources like CPU and memory. Users have been requesting this type of functionality because GPU compute can get expensive and they want to maximize the utilization to get the most bang for their bucks. A traditional way to do this is via SRIOV/virtualization but that often means time sharing the GPU as a whole unit. That is useful for some applications but not others due to the flushing and added latency. We also have a study that identified various GPU compute application types. These types can benefit from more asymmetrical/granular sharing of the GPU (for example some applications are compute bound while others can be memory bound that can benefit from having more VRAM.) I have been trying to add a cgroup subsystem for the drm subsystem for this purpose but I ran into two challenges. First, the composition of a GPU and how some of the subcomponents (like VRAM or shader engines/compute units) can be shared are very much vendor specific so we are unable to arrive at a common interface across all vendors. Because of this and the variety of places a GPU can go into (smartphone, PC, server, HPC), there is also no agreement on how exactly a GPU should be shared. The best way forward appears to simply provide hooks for users to define how and what they want to share via a bpf program. >From what I can tell so far (I am still learning), there are multiple pieces that need to fall in place for bpf-cgroup to work for this use case. First there is resource limit enforcement, which is the motivation for this RFC (I will look into bpf_lsm as the path forward.) I have also been thinking about instrumenting the drm subsystem with a new BPF program type and have various attach types across the drm subsystem but I am not sure if this is allowed (this one is more for resource usage monitoring.) Another thing I have been considering is to have the gpu driver provide bpf helper functions for bpf programs to modify drm driver internals. That was the reason I asked about the potential of BTF support for kernel modules a couple of months ago (and Andrii Nakryiko mentioned that it is being worked on.) Please feel free to ask more questions if any of the above is unclear. Feedbacks are always welcome. Regards, Kenny