On 2/27/23 4:20 PM, Yafang Shao wrote:
Currently we can't get bpf memory usage reliably. bpftool now shows the bpf memory footprint, which is difference with bpf memory usage. The difference can be quite great in some cases, for example, - non-preallocated bpf map The non-preallocated bpf map memory usage is dynamically changed. The allocated elements count can be from 0 to the max entries. But the memory footprint in bpftool only shows a fixed number. - bpf metadata consumes more memory than bpf element In some corner cases, the bpf metadata can consumes a lot more memory than bpf element consumes. For example, it can happen when the element size is quite small. - some maps don't have key, value or max_entries For example the key_size and value_size of ringbuf is 0, so its memlock is always 0. We need a way to show the bpf memory usage especially there will be more and more bpf programs running on the production environment and thus the bpf memory usage is not trivial. This patchset introduces a new map ops ->map_mem_usage to calculate the memory usage. Note that we don't intend to make the memory usage 100% accurate, while our goal is to make sure there is only a small difference between what bpftool reports and the real memory. That small difference can be ignored compared to the total usage. That is enough to monitor the bpf memory usage. For example, the user can rely on this value to monitor the trend of bpf memory usage, compare the difference in bpf memory usage between different bpf program versions, figure out which maps consume large memory, and etc.
Now that there is the cgroup.memory=nobpf, this is now rebuilding the memory accounting as a band aid that you would otherwise get for free via memcg.. :/ Can't you instead move the selectable memcg forward? Tejun and others have brought up the resource domain concept, have you looked into it? Thanks, Daniel