On 10/25/24 12:40 AM, Yosry Ahmed wrote:
On Thu, Oct 24, 2024 at 6:16 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
On Thu, Oct 24, 2024 at 05:57:25PM GMT, Yosry Ahmed wrote:
On Thu, Oct 24, 2024 at 5:26 PM JP Kobryn <inwardvessel@xxxxxxxxx> wrote:
Make use of the flush tracepoint within memcontrol.
Signed-off-by: JP Kobryn <inwardvessel@xxxxxxxxx>
Is the intention to use tools like bpftrace to analyze where we flush
the most? In this case, why can't we just attach to the fentry of
do_flush_stats() and use the stack trace to find the path?
We can also attach to mem_cgroup_flush_stats(), and the difference in
counts between the two will be the number of skipped flushes.
All these functions can get inlined and then we can not really attach
easily. We can somehow find the offset in the inlined places and try to
use kprobe but it is prohibitive when have to do for multiple kernels
built with fdo/bolt.
Please note that tracepoints are not really API, so we can remove them
in future if we see no usage for them.
That's fair, but can we just add two tracepoints? This seems enough to
collect necessary data, and prevent proliferation of tracepoints and
the addition of the enum.
I am thinking one in mem_cgroup_flush_stats() and one in
do_flush_stats(), e.g. trace_mem_cgroup_flush_stats() and
trace_do_flush_stats(). Although the name of the latter is too
generic, maybe we should rename the function first to add mem_cgroup_*
or memcg_*.
WDYT?
Hmmm, I think if we did that we wouldn't get accurate info on when the
flush was skipped. Comparing the number of hits between
mem_cgroup_flush_stats() and do_flush_stats() to determine the number of
skips doesn't seem reliable because of the places where do_flush_stats()
is called outside of mem_cgroup_flush_stats(). There would be situations
where a skip occurs, but meanwhile each call to do_flush_stats() outside
of mem_cgroup_flush_stats() would effectively subtract that skip, making
it appear that a skip did not occur.
Maybe as a middle ground we could remove the trace calls for the zswap
and periodic cases, since no skips can occur there. We could then just
leave one trace call in mem_cgroup_flush_stats() and instead of an enum
we can pass a bool saying skipped or not. Something like this:
mem_cgroup_flush_stats()
{
bool needs_flush = memcg_vmstats_needs_flush(...);
trace_memcg_flush_stats(memcg, needs_flush);
if (needs_flush)
do_flush_stats(...);
}
Yosry/Shakeel, do you have any thoughts on whether we should keep the
trace calls for obj_cgroup_may_zswap() and periodic workqueue cases?