On Tue, Mar 22, 2022 at 11:09 AM Yonghong Song <yhs@xxxxxx> wrote: > > > > On 3/16/22 9:35 AM, Yosry Ahmed wrote: > > Hi Tejun, > > > > Thanks for taking the time to read my proposal! Sorry for the late > > reply. This email skipped my inbox for some reason. > > > > On Sun, Mar 13, 2022 at 10:35 PM Tejun Heo <tj@xxxxxxxxxx> wrote: > >> > >> Hello, > >> > >> On Wed, Mar 09, 2022 at 12:27:15PM -0800, Yosry Ahmed wrote: > >> ... > >>> These problems are already addressed by the rstat aggregation > >>> mechanism in the kernel, which is primarily used for memcg stats. We > >> > >> Not that it matters all that much but I don't think the above statement is > >> true given that sched stats are an integrated part of the rstat > >> implementation and io was converted before memcg. > >> > > > > Excuse my ignorance, I am new to kernel development. I only saw calls > > to cgroup_rstat_updated() in memcg and io and assumed they were the > > only users. Now I found cpu_account_cputime() :) > > > >>> - For every cgroup, we will either use flags to distinguish BPF stats > >>> updates from normal stats updates, or flush both anyway (memcg stats > >>> are periodically flushed anyway). > >> > >> I'd just keep them together. Usually most activities tend to happen > >> together, so it's cheaper to aggregate all of them in one go in most cases. > > > > This makes sense to me, thanks. > > > >> > >>> - Provide flags to enable/disable using per-cpu arrays (for stats that > >>> are not updated frequently), and enable/disable hierarchical > >>> aggregation (for non-hierarchical stats, they can still make benefit > >>> of the automatic entries creation & deletion). > >>> - Provide different hierarchical aggregation operations : SUM, MAX, MIN, etc. > >>> - Instead of an array as the map value, use a struct, and let the user > >>> provide an aggregator function in the form of a BPF program. > >> > >> I'm more partial to the last option. It does make the usage a bit more > >> compilcated but hopefully it shouldn't be too bad with good examples. > >> > >> I don't have strong opinions on the bpf side of things but it'd be great to > >> be able to use rstat from bpf. > > > > It indeed gives more flexibility but is more complicated. Also, I am > > not sure about the overhead to make calls to BPF programs in every > > aggregation step. Looking forward to get feedback on the bpf side of > > things. > > Hi, Yosry, I heard this was discussed in bpf office hour which I > didn't attend. Could you summarize the conclusion and what is the > step forward? We also have an internal tool which collects cgroup > stats and this might help us as well. Thanks! > > > > >> > >> Thanks. > >> > >> -- > >> tejun Hi Yonghong, Hao has already done an excellent job summarizing the outcome of the meeting. The idea I have is basically to introduce "rstat flushing" BPF programs. BPF programs that collect and display stats would use helpers to call cgroup_rstat_flush() and cgroup_rstat_updated() (or similar). rstat would then make calls to the "rstat flushing" BPF programs during flushes, similar to calls to css_rstat_flush(). I will work on an RFC patch(es) for this soon. Let me know if you have any comments/suggestions/feedback. Thanks!