Re: [RFC v1 00/14] Implement call_rcu_lazy() and miscellaneous fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, May 14, 2022 at 10:25 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, May 13, 2022 at 05:43:51PM +0200, Uladzislau Rezki wrote:
> > > On Fri, May 13, 2022 at 9:36 AM Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
> > > >
> > > > > > On Thu, May 12, 2022 at 10:37 AM Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > On Thu, May 12, 2022 at 03:56:37PM +0200, Uladzislau Rezki wrote:
> > > > > > > > > Never mind. I port it into 5.10
> > > > > > > >
> > > > > > > > Oh, this is on mainline. Sorry about that. If you want I have a tree here for
> > > > > > > > 5.10 , although that does not have the kfree changes, everything else is
> > > > > > > > ditto.
> > > > > > > > https://github.com/joelagnel/linux-kernel/tree/rcu-nocb-4
> > > > > > > >
> > > > > > > No problem. kfree_rcu two patches are not so important in this series.
> > > > > > > So i have backported them into my 5.10 kernel because the latest kernel
> > > > > > > is not so easy to up and run on my device :)
> > > > > >
> > > > > > Actually I was going to write here, apparently some tests are showing
> > > > > > kfree_rcu()->call_rcu_lazy() causing possible regression. So it is
> > > > > > good to drop those for your initial testing!
> > > > > >
> > > > > Yep, i dropped both. The one that make use of call_rcu_lazy() seems not
> > > > > so important for kfree_rcu() because we do batch requests there anyway.
> > > > > One thing that i would like to improve in kfree_rcu() is a better utilization
> > > > > of page slots.
> > > > >
> > > > > I will share my results either tomorrow or on Monday. I hope that is fine.
> > > > >
> > > >
> > > > Here we go with some data on our Android handset that runs 5.10 kernel. The test
> > > > case i have checked was a "static image" use case. Condition is: screen ON with
> > > > disabled all connectivity.
> > > >
> > > > 1.
> > > > First data i took is how many wakeups cause an RCU subsystem during this test case
> > > > when everything is pretty idling. Duration is 360 seconds:
> > > >
> > > > <snip>
> > > > serezkiul@seldlx26095:~/data/call_rcu_lazy$ ./psp ./perf_360_sec_rcu_lazy_off.script | sort -nk 6 | grep rcu
> > >
> > > Nice! Do you mind sharing this script? I was just talking to Rushikesh
> > > that we want something like this during testing. Appreciate it. Also,
> > > if we dump timer wakeup reasons/callbacks that would also be awesome.
> > >
> > Please find in attachment. I wrote it once upon a time and make use of
> > to parse "perf script" output, i.e. raw data. The file name is: perf_script_parser.c
> > so just compile it.
> >
> > How to use it:
> > 1. run perf: './perf sched record -a -- sleep "how much in sec you want to collect data"'
> > 2. ./perf script -i ./perf.data > foo.script
> > 3. ./perf_script_parser ./foo.script
>
> Thanks a lot for sharing this. I think it will be quite useful. FWIW, I also
> use "perf sched record" and "perf sched report --sort latency" to get wakeup
> latencies.
>
> > > FWIW, I wrote a BPF tool that periodically dumps callbacks and can
> > > share that with you on request as well. That is probably not in a
> > > shape for mainline though (Makefile missing and such).
> > >
> > Yep, please share!
>
> Sure, check out my bcc repo from here:
> https://github.com/joelagnel/bcc

Kind of random, but this is why I love sharing my tools with others. I
actually forgot that I checked in the code for my rcutop here, and
this email is how I found it, lol :)

Did you happen to try it out too btw? :)

Thanks,

Joel


>
> Build this project, then cd libbpf-tools and run make. This should produce a
> static binary 'rcutop' which you can push to Android. You have to build for
> ARM which bcc should have instructions for. I have also included the rcuptop
> diff at the end of this file for reference.
>
> > > > 2.
> > > > Please find in attachment two power plots. The same test case. One is related to a
> > > > regular use of call_rcu() and second one is "lazy" usage. There is light a difference
> > > > in power, it is ~2mA. Event though it is rather small but it is detectable and solid
> > > > what is also important, thus it proofs the concept. Please note it might be more power
> > > > efficient for other arches and platforms. Because of different HW design that is related
> > > > to C-states of CPU and energy that is needed to in/out of those deep power states.
> > >
> > > Nice! I wonder if you still have other frequent callbacks on your
> > > system that are getting queued during the tests. Could you dump the
> > > rcu_callbacks trace event and see if you have any CBs frequently
> > > called that the series did not address?
> > >
> > I have pretty much like this:
> > <snip>
> >     rcuop/2-33      [002] d..1  6172.420541: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/1-26      [001] d..1  6173.131965: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/0-15      [001] d..1  6173.696540: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/3-40      [003] d..1  6173.703695: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/0-15      [001] d..1  6173.711607: rcu_batch_start: rcu_preempt CBs=1667 bl=13
> >     rcuop/1-26      [000] d..1  6175.619722: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/2-33      [001] d..1  6176.135844: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/3-40      [002] d..1  6176.303723: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/0-15      [002] d..1  6176.519894: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/0-15      [003] d..1  6176.527895: rcu_batch_start: rcu_preempt CBs=273 bl=10
> >     rcuop/1-26      [003] d..1  6178.543729: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/1-26      [003] d..1  6178.551707: rcu_batch_start: rcu_preempt CBs=1317 bl=10
> >     rcuop/0-15      [003] d..1  6178.819698: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/0-15      [003] d..1  6178.827734: rcu_batch_start: rcu_preempt CBs=949 bl=10
> >     rcuop/3-40      [001] d..1  6179.203645: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/2-33      [001] d..1  6179.455747: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/2-33      [002] d..1  6179.471725: rcu_batch_start: rcu_preempt CBs=1983 bl=15
> >     rcuop/1-26      [003] d..1  6181.287646: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/1-26      [003] d..1  6181.295607: rcu_batch_start: rcu_preempt CBs=55 bl=10
> > <snip>
> >
> > so almost everything is batched.
>
> Nice, glad to know this is happening even without the kfree_rcu() changes.
>
> > > Also, one more thing I was curious about is - do you see savings when
> > > you pin the rcu threads to the LITTLE CPUs of the system? The theory
> > > being, not disturbing the BIG CPUs which are more power hungry may let
> > > them go into a deeper idle state and save power (due to leakage
> > > current and so forth).
> > >
> > I did some experimenting to pin nocbs to a little cluster. For idle use
> > cases i did not see any power gain. For heavy one i see that "big" CPUs
> > are also invoking and busy with it quite often. Probably i should think
> > of some use case where i can detect the power difference. If you have
> > something please let me know.
>
> Yeah, probably screen off + audio playback might be a good one, because it
> lightly loads the CPUs.
>
> > > > So a front-lazy-batching is something worth to have, IMHO :)
> > >
> > > Exciting! Being lazy pays off some times ;-) ;-). If you are Ok with
> > > it, we can add your data to the LPC slides as well about your
> > > investigation (with attribution to you).
> > >
> > No problem, since we will give a talk on LPC, more data we have more
> > convincing we are :)
>
> I forget, you did mention you are Ok with presenting with us right? It would
> be great if you present your data when we come to Android, if you are OK with
> it. I'll start a common slide deck soon and share so you, Rushikesh and me
> can add slides to it and present together.
>
> thanks,
>
>  - Joel
>
> ---8<-----------------------
>
> From: Joel Fernandes <joelaf@xxxxxxxxxx>
> Subject: [PATCH] rcutop
>
> Signed-off-by: Joel Fernandes <joelaf@xxxxxxxxxx>
> ---
>  libbpf-tools/Makefile     |   1 +
>  libbpf-tools/rcutop.bpf.c |  56 ++++++++
>  libbpf-tools/rcutop.c     | 288 ++++++++++++++++++++++++++++++++++++++
>  libbpf-tools/rcutop.h     |   8 ++
>  4 files changed, 353 insertions(+)
>  create mode 100644 libbpf-tools/rcutop.bpf.c
>  create mode 100644 libbpf-tools/rcutop.c
>  create mode 100644 libbpf-tools/rcutop.h
>
> diff --git a/libbpf-tools/Makefile b/libbpf-tools/Makefile
> index e60ec409..0d4cdff2 100644
> --- a/libbpf-tools/Makefile
> +++ b/libbpf-tools/Makefile
> @@ -42,6 +42,7 @@ APPS = \
>         klockstat \
>         ksnoop \
>         llcstat \
> +       rcutop \
>         mountsnoop \
>         numamove \
>         offcputime \
> diff --git a/libbpf-tools/rcutop.bpf.c b/libbpf-tools/rcutop.bpf.c
> new file mode 100644
> index 00000000..8287bbe2
> --- /dev/null
> +++ b/libbpf-tools/rcutop.bpf.c
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +/* Copyright (c) 2021 Hengqi Chen */
> +#include <vmlinux.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_core_read.h>
> +#include <bpf/bpf_tracing.h>
> +#include "rcutop.h"
> +#include "maps.bpf.h"
> +
> +#define MAX_ENTRIES    10240
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_HASH);
> +       __uint(max_entries, MAX_ENTRIES);
> +       __type(key, void *);
> +       __type(value, int);
> +} cbs_queued SEC(".maps");
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_HASH);
> +       __uint(max_entries, MAX_ENTRIES);
> +       __type(key, void *);
> +       __type(value, int);
> +} cbs_executed SEC(".maps");
> +
> +SEC("tracepoint/rcu/rcu_callback")
> +int tracepoint_rcu_callback(struct trace_event_raw_rcu_callback* ctx)
> +{
> +       void *key = ctx->func;
> +       int *val = NULL;
> +       static const int zero;
> +
> +       val = bpf_map_lookup_or_try_init(&cbs_queued, &key, &zero);
> +       if (val) {
> +               __sync_fetch_and_add(val, 1);
> +       }
> +
> +       return 0;
> +}
> +
> +SEC("tracepoint/rcu/rcu_invoke_callback")
> +int tracepoint_rcu_invoke_callback(struct trace_event_raw_rcu_invoke_callback* ctx)
> +{
> +       void *key = ctx->func;
> +       int *val;
> +       int zero = 0;
> +
> +       val = bpf_map_lookup_or_try_init(&cbs_executed, (void *)&key, (void *)&zero);
> +       if (val) {
> +               __sync_fetch_and_add(val, 1);
> +       }
> +
> +       return 0;
> +}
> +
> +char LICENSE[] SEC("license") = "Dual BSD/GPL";
> diff --git a/libbpf-tools/rcutop.c b/libbpf-tools/rcutop.c
> new file mode 100644
> index 00000000..35795875
> --- /dev/null
> +++ b/libbpf-tools/rcutop.c
> @@ -0,0 +1,288 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +
> +/*
> + * rcutop
> + * Copyright (c) 2022 Joel Fernandes
> + *
> + * 05-May-2022   Joel Fernandes   Created this.
> + */
> +#include <argp.h>
> +#include <errno.h>
> +#include <signal.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <time.h>
> +#include <unistd.h>
> +
> +#include <bpf/libbpf.h>
> +#include <bpf/bpf.h>
> +#include "rcutop.h"
> +#include "rcutop.skel.h"
> +#include "btf_helpers.h"
> +#include "trace_helpers.h"
> +
> +#define warn(...) fprintf(stderr, __VA_ARGS__)
> +#define OUTPUT_ROWS_LIMIT 10240
> +
> +static volatile sig_atomic_t exiting = 0;
> +
> +static bool clear_screen = true;
> +static int output_rows = 20;
> +static int interval = 1;
> +static int count = 99999999;
> +static bool verbose = false;
> +
> +const char *argp_program_version = "rcutop 0.1";
> +const char *argp_program_bug_address =
> +"https://github.com/iovisor/bcc/tree/master/libbpf-tools";;
> +const char argp_program_doc[] =
> +"Show RCU callback queuing and execution stats.\n"
> +"\n"
> +"USAGE: rcutop [-h] [interval] [count]\n"
> +"\n"
> +"EXAMPLES:\n"
> +"    rcutop            # rcu activity top, refresh every 1s\n"
> +"    rcutop 5 10       # 5s summaries, 10 times\n";
> +
> +static const struct argp_option opts[] = {
> +       { "noclear", 'C', NULL, 0, "Don't clear the screen" },
> +       { "rows", 'r', "ROWS", 0, "Maximum rows to print, default 20" },
> +       { "verbose", 'v', NULL, 0, "Verbose debug output" },
> +       { NULL, 'h', NULL, OPTION_HIDDEN, "Show the full help" },
> +       {},
> +};
> +
> +static error_t parse_arg(int key, char *arg, struct argp_state *state)
> +{
> +       long rows;
> +       static int pos_args;
> +
> +       switch (key) {
> +               case 'C':
> +                       clear_screen = false;
> +                       break;
> +               case 'v':
> +                       verbose = true;
> +                       break;
> +               case 'h':
> +                       argp_state_help(state, stderr, ARGP_HELP_STD_HELP);
> +                       break;
> +               case 'r':
> +                       errno = 0;
> +                       rows = strtol(arg, NULL, 10);
> +                       if (errno || rows <= 0) {
> +                               warn("invalid rows: %s\n", arg);
> +                               argp_usage(state);
> +                       }
> +                       output_rows = rows;
> +                       if (output_rows > OUTPUT_ROWS_LIMIT)
> +                               output_rows = OUTPUT_ROWS_LIMIT;
> +                       break;
> +               case ARGP_KEY_ARG:
> +                       errno = 0;
> +                       if (pos_args == 0) {
> +                               interval = strtol(arg, NULL, 10);
> +                               if (errno || interval <= 0) {
> +                                       warn("invalid interval\n");
> +                                       argp_usage(state);
> +                               }
> +                       } else if (pos_args == 1) {
> +                               count = strtol(arg, NULL, 10);
> +                               if (errno || count <= 0) {
> +                                       warn("invalid count\n");
> +                                       argp_usage(state);
> +                               }
> +                       } else {
> +                               warn("unrecognized positional argument: %s\n", arg);
> +                               argp_usage(state);
> +                       }
> +                       pos_args++;
> +                       break;
> +               default:
> +                       return ARGP_ERR_UNKNOWN;
> +       }
> +       return 0;
> +}
> +
> +static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args)
> +{
> +       if (level == LIBBPF_DEBUG && !verbose)
> +               return 0;
> +       return vfprintf(stderr, format, args);
> +}
> +
> +static void sig_int(int signo)
> +{
> +       exiting = 1;
> +}
> +
> +static int print_stat(struct ksyms *ksyms, struct syms_cache *syms_cache,
> +               struct rcutop_bpf *obj)
> +{
> +       void *key, **prev_key = NULL;
> +       int n, err = 0;
> +       int qfd = bpf_map__fd(obj->maps.cbs_queued);
> +       int efd = bpf_map__fd(obj->maps.cbs_executed);
> +       const struct ksym *ksym;
> +       FILE *f;
> +       time_t t;
> +       struct tm *tm;
> +       char ts[16], buf[256];
> +
> +       f = fopen("/proc/loadavg", "r");
> +       if (f) {
> +               time(&t);
> +               tm = localtime(&t);
> +               strftime(ts, sizeof(ts), "%H:%M:%S", tm);
> +               memset(buf, 0, sizeof(buf));
> +               n = fread(buf, 1, sizeof(buf), f);
> +               if (n)
> +                       printf("%8s loadavg: %s\n", ts, buf);
> +               fclose(f);
> +       }
> +
> +       printf("%-32s %-6s %-6s\n", "Callback", "Queued", "Executed");
> +
> +       while (1) {
> +               int qcount = 0, ecount = 0;
> +
> +               err = bpf_map_get_next_key(qfd, prev_key, &key);
> +               if (err) {
> +                       if (errno == ENOENT) {
> +                               err = 0;
> +                               break;
> +                       }
> +                       warn("bpf_map_get_next_key failed: %s\n", strerror(errno));
> +                       return err;
> +               }
> +
> +               err = bpf_map_lookup_elem(qfd, &key, &qcount);
> +               if (err) {
> +                       warn("bpf_map_lookup_elem failed: %s\n", strerror(errno));
> +                       return err;
> +               }
> +               prev_key = &key;
> +
> +               bpf_map_lookup_elem(efd, &key, &ecount);
> +
> +               ksym = ksyms__map_addr(ksyms, (unsigned long)key);
> +               printf("%-32s %-6d %-6d\n",
> +                               ksym ? ksym->name : "Unknown",
> +                               qcount, ecount);
> +       }
> +       printf("\n");
> +       prev_key = NULL;
> +       while (1) {
> +               err = bpf_map_get_next_key(qfd, prev_key, &key);
> +               if (err) {
> +                       if (errno == ENOENT) {
> +                               err = 0;
> +                               break;
> +                       }
> +                       warn("bpf_map_get_next_key failed: %s\n", strerror(errno));
> +                       return err;
> +               }
> +               err = bpf_map_delete_elem(qfd, &key);
> +               if (err) {
> +                       if (errno == ENOENT) {
> +                               err = 0;
> +                               continue;
> +                       }
> +                       warn("bpf_map_delete_elem failed: %s\n", strerror(errno));
> +                       return err;
> +               }
> +
> +               bpf_map_delete_elem(efd, &key);
> +               prev_key = &key;
> +       }
> +
> +       return err;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +       LIBBPF_OPTS(bpf_object_open_opts, open_opts);
> +       static const struct argp argp = {
> +               .options = opts,
> +               .parser = parse_arg,
> +               .doc = argp_program_doc,
> +       };
> +       struct rcutop_bpf *obj;
> +       int err;
> +       struct syms_cache *syms_cache = NULL;
> +       struct ksyms *ksyms = NULL;
> +
> +       err = argp_parse(&argp, argc, argv, 0, NULL, NULL);
> +       if (err)
> +               return err;
> +
> +       libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
> +       libbpf_set_print(libbpf_print_fn);
> +
> +       err = ensure_core_btf(&open_opts);
> +       if (err) {
> +               fprintf(stderr, "failed to fetch necessary BTF for CO-RE: %s\n", strerror(-err));
> +               return 1;
> +       }
> +
> +       obj = rcutop_bpf__open_opts(&open_opts);
> +       if (!obj) {
> +               warn("failed to open BPF object\n");
> +               return 1;
> +       }
> +
> +       err = rcutop_bpf__load(obj);
> +       if (err) {
> +               warn("failed to load BPF object: %d\n", err);
> +               goto cleanup;
> +       }
> +
> +       err = rcutop_bpf__attach(obj);
> +       if (err) {
> +               warn("failed to attach BPF programs: %d\n", err);
> +               goto cleanup;
> +       }
> +
> +       ksyms = ksyms__load();
> +       if (!ksyms) {
> +               fprintf(stderr, "failed to load kallsyms\n");
> +               goto cleanup;
> +       }
> +
> +       syms_cache = syms_cache__new(0);
> +       if (!syms_cache) {
> +               fprintf(stderr, "failed to create syms_cache\n");
> +               goto cleanup;
> +       }
> +
> +       if (signal(SIGINT, sig_int) == SIG_ERR) {
> +               warn("can't set signal handler: %s\n", strerror(errno));
> +               err = 1;
> +               goto cleanup;
> +       }
> +
> +       while (1) {
> +               sleep(interval);
> +
> +               if (clear_screen) {
> +                       err = system("clear");
> +                       if (err)
> +                               goto cleanup;
> +               }
> +
> +               err = print_stat(ksyms, syms_cache, obj);
> +               if (err)
> +                       goto cleanup;
> +
> +               count--;
> +               if (exiting || !count)
> +                       goto cleanup;
> +       }
> +
> +cleanup:
> +       rcutop_bpf__destroy(obj);
> +       cleanup_core_btf(&open_opts);
> +
> +       return err != 0;
> +}
> diff --git a/libbpf-tools/rcutop.h b/libbpf-tools/rcutop.h
> new file mode 100644
> index 00000000..cb2a3557
> --- /dev/null
> +++ b/libbpf-tools/rcutop.h
> @@ -0,0 +1,8 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +#ifndef __RCUTOP_H
> +#define __RCUTOP_H
> +
> +#define PATH_MAX       4096
> +#define TASK_COMM_LEN  16
> +
> +#endif /* __RCUTOP_H */
> --
> 2.36.0.550.gb090851708-goog
>

On Sat, May 14, 2022 at 10:25 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, May 13, 2022 at 05:43:51PM +0200, Uladzislau Rezki wrote:
> > > On Fri, May 13, 2022 at 9:36 AM Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
> > > >
> > > > > > On Thu, May 12, 2022 at 10:37 AM Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > On Thu, May 12, 2022 at 03:56:37PM +0200, Uladzislau Rezki wrote:
> > > > > > > > > Never mind. I port it into 5.10
> > > > > > > >
> > > > > > > > Oh, this is on mainline. Sorry about that. If you want I have a tree here for
> > > > > > > > 5.10 , although that does not have the kfree changes, everything else is
> > > > > > > > ditto.
> > > > > > > > https://github.com/joelagnel/linux-kernel/tree/rcu-nocb-4
> > > > > > > >
> > > > > > > No problem. kfree_rcu two patches are not so important in this series.
> > > > > > > So i have backported them into my 5.10 kernel because the latest kernel
> > > > > > > is not so easy to up and run on my device :)
> > > > > >
> > > > > > Actually I was going to write here, apparently some tests are showing
> > > > > > kfree_rcu()->call_rcu_lazy() causing possible regression. So it is
> > > > > > good to drop those for your initial testing!
> > > > > >
> > > > > Yep, i dropped both. The one that make use of call_rcu_lazy() seems not
> > > > > so important for kfree_rcu() because we do batch requests there anyway.
> > > > > One thing that i would like to improve in kfree_rcu() is a better utilization
> > > > > of page slots.
> > > > >
> > > > > I will share my results either tomorrow or on Monday. I hope that is fine.
> > > > >
> > > >
> > > > Here we go with some data on our Android handset that runs 5.10 kernel. The test
> > > > case i have checked was a "static image" use case. Condition is: screen ON with
> > > > disabled all connectivity.
> > > >
> > > > 1.
> > > > First data i took is how many wakeups cause an RCU subsystem during this test case
> > > > when everything is pretty idling. Duration is 360 seconds:
> > > >
> > > > <snip>
> > > > serezkiul@seldlx26095:~/data/call_rcu_lazy$ ./psp ./perf_360_sec_rcu_lazy_off.script | sort -nk 6 | grep rcu
> > >
> > > Nice! Do you mind sharing this script? I was just talking to Rushikesh
> > > that we want something like this during testing. Appreciate it. Also,
> > > if we dump timer wakeup reasons/callbacks that would also be awesome.
> > >
> > Please find in attachment. I wrote it once upon a time and make use of
> > to parse "perf script" output, i.e. raw data. The file name is: perf_script_parser.c
> > so just compile it.
> >
> > How to use it:
> > 1. run perf: './perf sched record -a -- sleep "how much in sec you want to collect data"'
> > 2. ./perf script -i ./perf.data > foo.script
> > 3. ./perf_script_parser ./foo.script
>
> Thanks a lot for sharing this. I think it will be quite useful. FWIW, I also
> use "perf sched record" and "perf sched report --sort latency" to get wakeup
> latencies.
>
> > > FWIW, I wrote a BPF tool that periodically dumps callbacks and can
> > > share that with you on request as well. That is probably not in a
> > > shape for mainline though (Makefile missing and such).
> > >
> > Yep, please share!
>
> Sure, check out my bcc repo from here:
> https://github.com/joelagnel/bcc
>
> Build this project, then cd libbpf-tools and run make. This should produce a
> static binary 'rcutop' which you can push to Android. You have to build for
> ARM which bcc should have instructions for. I have also included the rcuptop
> diff at the end of this file for reference.
>
> > > > 2.
> > > > Please find in attachment two power plots. The same test case. One is related to a
> > > > regular use of call_rcu() and second one is "lazy" usage. There is light a difference
> > > > in power, it is ~2mA. Event though it is rather small but it is detectable and solid
> > > > what is also important, thus it proofs the concept. Please note it might be more power
> > > > efficient for other arches and platforms. Because of different HW design that is related
> > > > to C-states of CPU and energy that is needed to in/out of those deep power states.
> > >
> > > Nice! I wonder if you still have other frequent callbacks on your
> > > system that are getting queued during the tests. Could you dump the
> > > rcu_callbacks trace event and see if you have any CBs frequently
> > > called that the series did not address?
> > >
> > I have pretty much like this:
> > <snip>
> >     rcuop/2-33      [002] d..1  6172.420541: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/1-26      [001] d..1  6173.131965: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/0-15      [001] d..1  6173.696540: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/3-40      [003] d..1  6173.703695: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/0-15      [001] d..1  6173.711607: rcu_batch_start: rcu_preempt CBs=1667 bl=13
> >     rcuop/1-26      [000] d..1  6175.619722: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/2-33      [001] d..1  6176.135844: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/3-40      [002] d..1  6176.303723: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/0-15      [002] d..1  6176.519894: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/0-15      [003] d..1  6176.527895: rcu_batch_start: rcu_preempt CBs=273 bl=10
> >     rcuop/1-26      [003] d..1  6178.543729: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/1-26      [003] d..1  6178.551707: rcu_batch_start: rcu_preempt CBs=1317 bl=10
> >     rcuop/0-15      [003] d..1  6178.819698: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/0-15      [003] d..1  6178.827734: rcu_batch_start: rcu_preempt CBs=949 bl=10
> >     rcuop/3-40      [001] d..1  6179.203645: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/2-33      [001] d..1  6179.455747: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/2-33      [002] d..1  6179.471725: rcu_batch_start: rcu_preempt CBs=1983 bl=15
> >     rcuop/1-26      [003] d..1  6181.287646: rcu_batch_start: rcu_preempt CBs=2048 bl=16
> >     rcuop/1-26      [003] d..1  6181.295607: rcu_batch_start: rcu_preempt CBs=55 bl=10
> > <snip>
> >
> > so almost everything is batched.
>
> Nice, glad to know this is happening even without the kfree_rcu() changes.
>
> > > Also, one more thing I was curious about is - do you see savings when
> > > you pin the rcu threads to the LITTLE CPUs of the system? The theory
> > > being, not disturbing the BIG CPUs which are more power hungry may let
> > > them go into a deeper idle state and save power (due to leakage
> > > current and so forth).
> > >
> > I did some experimenting to pin nocbs to a little cluster. For idle use
> > cases i did not see any power gain. For heavy one i see that "big" CPUs
> > are also invoking and busy with it quite often. Probably i should think
> > of some use case where i can detect the power difference. If you have
> > something please let me know.
>
> Yeah, probably screen off + audio playback might be a good one, because it
> lightly loads the CPUs.
>
> > > > So a front-lazy-batching is something worth to have, IMHO :)
> > >
> > > Exciting! Being lazy pays off some times ;-) ;-). If you are Ok with
> > > it, we can add your data to the LPC slides as well about your
> > > investigation (with attribution to you).
> > >
> > No problem, since we will give a talk on LPC, more data we have more
> > convincing we are :)
>
> I forget, you did mention you are Ok with presenting with us right? It would
> be great if you present your data when we come to Android, if you are OK with
> it. I'll start a common slide deck soon and share so you, Rushikesh and me
> can add slides to it and present together.
>
> thanks,
>
>  - Joel
>
> ---8<-----------------------
>
> From: Joel Fernandes <joelaf@xxxxxxxxxx>
> Subject: [PATCH] rcutop
>
> Signed-off-by: Joel Fernandes <joelaf@xxxxxxxxxx>
> ---
>  libbpf-tools/Makefile     |   1 +
>  libbpf-tools/rcutop.bpf.c |  56 ++++++++
>  libbpf-tools/rcutop.c     | 288 ++++++++++++++++++++++++++++++++++++++
>  libbpf-tools/rcutop.h     |   8 ++
>  4 files changed, 353 insertions(+)
>  create mode 100644 libbpf-tools/rcutop.bpf.c
>  create mode 100644 libbpf-tools/rcutop.c
>  create mode 100644 libbpf-tools/rcutop.h
>
> diff --git a/libbpf-tools/Makefile b/libbpf-tools/Makefile
> index e60ec409..0d4cdff2 100644
> --- a/libbpf-tools/Makefile
> +++ b/libbpf-tools/Makefile
> @@ -42,6 +42,7 @@ APPS = \
>         klockstat \
>         ksnoop \
>         llcstat \
> +       rcutop \
>         mountsnoop \
>         numamove \
>         offcputime \
> diff --git a/libbpf-tools/rcutop.bpf.c b/libbpf-tools/rcutop.bpf.c
> new file mode 100644
> index 00000000..8287bbe2
> --- /dev/null
> +++ b/libbpf-tools/rcutop.bpf.c
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +/* Copyright (c) 2021 Hengqi Chen */
> +#include <vmlinux.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_core_read.h>
> +#include <bpf/bpf_tracing.h>
> +#include "rcutop.h"
> +#include "maps.bpf.h"
> +
> +#define MAX_ENTRIES    10240
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_HASH);
> +       __uint(max_entries, MAX_ENTRIES);
> +       __type(key, void *);
> +       __type(value, int);
> +} cbs_queued SEC(".maps");
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_HASH);
> +       __uint(max_entries, MAX_ENTRIES);
> +       __type(key, void *);
> +       __type(value, int);
> +} cbs_executed SEC(".maps");
> +
> +SEC("tracepoint/rcu/rcu_callback")
> +int tracepoint_rcu_callback(struct trace_event_raw_rcu_callback* ctx)
> +{
> +       void *key = ctx->func;
> +       int *val = NULL;
> +       static const int zero;
> +
> +       val = bpf_map_lookup_or_try_init(&cbs_queued, &key, &zero);
> +       if (val) {
> +               __sync_fetch_and_add(val, 1);
> +       }
> +
> +       return 0;
> +}
> +
> +SEC("tracepoint/rcu/rcu_invoke_callback")
> +int tracepoint_rcu_invoke_callback(struct trace_event_raw_rcu_invoke_callback* ctx)
> +{
> +       void *key = ctx->func;
> +       int *val;
> +       int zero = 0;
> +
> +       val = bpf_map_lookup_or_try_init(&cbs_executed, (void *)&key, (void *)&zero);
> +       if (val) {
> +               __sync_fetch_and_add(val, 1);
> +       }
> +
> +       return 0;
> +}
> +
> +char LICENSE[] SEC("license") = "Dual BSD/GPL";
> diff --git a/libbpf-tools/rcutop.c b/libbpf-tools/rcutop.c
> new file mode 100644
> index 00000000..35795875
> --- /dev/null
> +++ b/libbpf-tools/rcutop.c
> @@ -0,0 +1,288 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +
> +/*
> + * rcutop
> + * Copyright (c) 2022 Joel Fernandes
> + *
> + * 05-May-2022   Joel Fernandes   Created this.
> + */
> +#include <argp.h>
> +#include <errno.h>
> +#include <signal.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <time.h>
> +#include <unistd.h>
> +
> +#include <bpf/libbpf.h>
> +#include <bpf/bpf.h>
> +#include "rcutop.h"
> +#include "rcutop.skel.h"
> +#include "btf_helpers.h"
> +#include "trace_helpers.h"
> +
> +#define warn(...) fprintf(stderr, __VA_ARGS__)
> +#define OUTPUT_ROWS_LIMIT 10240
> +
> +static volatile sig_atomic_t exiting = 0;
> +
> +static bool clear_screen = true;
> +static int output_rows = 20;
> +static int interval = 1;
> +static int count = 99999999;
> +static bool verbose = false;
> +
> +const char *argp_program_version = "rcutop 0.1";
> +const char *argp_program_bug_address =
> +"https://github.com/iovisor/bcc/tree/master/libbpf-tools";;
> +const char argp_program_doc[] =
> +"Show RCU callback queuing and execution stats.\n"
> +"\n"
> +"USAGE: rcutop [-h] [interval] [count]\n"
> +"\n"
> +"EXAMPLES:\n"
> +"    rcutop            # rcu activity top, refresh every 1s\n"
> +"    rcutop 5 10       # 5s summaries, 10 times\n";
> +
> +static const struct argp_option opts[] = {
> +       { "noclear", 'C', NULL, 0, "Don't clear the screen" },
> +       { "rows", 'r', "ROWS", 0, "Maximum rows to print, default 20" },
> +       { "verbose", 'v', NULL, 0, "Verbose debug output" },
> +       { NULL, 'h', NULL, OPTION_HIDDEN, "Show the full help" },
> +       {},
> +};
> +
> +static error_t parse_arg(int key, char *arg, struct argp_state *state)
> +{
> +       long rows;
> +       static int pos_args;
> +
> +       switch (key) {
> +               case 'C':
> +                       clear_screen = false;
> +                       break;
> +               case 'v':
> +                       verbose = true;
> +                       break;
> +               case 'h':
> +                       argp_state_help(state, stderr, ARGP_HELP_STD_HELP);
> +                       break;
> +               case 'r':
> +                       errno = 0;
> +                       rows = strtol(arg, NULL, 10);
> +                       if (errno || rows <= 0) {
> +                               warn("invalid rows: %s\n", arg);
> +                               argp_usage(state);
> +                       }
> +                       output_rows = rows;
> +                       if (output_rows > OUTPUT_ROWS_LIMIT)
> +                               output_rows = OUTPUT_ROWS_LIMIT;
> +                       break;
> +               case ARGP_KEY_ARG:
> +                       errno = 0;
> +                       if (pos_args == 0) {
> +                               interval = strtol(arg, NULL, 10);
> +                               if (errno || interval <= 0) {
> +                                       warn("invalid interval\n");
> +                                       argp_usage(state);
> +                               }
> +                       } else if (pos_args == 1) {
> +                               count = strtol(arg, NULL, 10);
> +                               if (errno || count <= 0) {
> +                                       warn("invalid count\n");
> +                                       argp_usage(state);
> +                               }
> +                       } else {
> +                               warn("unrecognized positional argument: %s\n", arg);
> +                               argp_usage(state);
> +                       }
> +                       pos_args++;
> +                       break;
> +               default:
> +                       return ARGP_ERR_UNKNOWN;
> +       }
> +       return 0;
> +}
> +
> +static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args)
> +{
> +       if (level == LIBBPF_DEBUG && !verbose)
> +               return 0;
> +       return vfprintf(stderr, format, args);
> +}
> +
> +static void sig_int(int signo)
> +{
> +       exiting = 1;
> +}
> +
> +static int print_stat(struct ksyms *ksyms, struct syms_cache *syms_cache,
> +               struct rcutop_bpf *obj)
> +{
> +       void *key, **prev_key = NULL;
> +       int n, err = 0;
> +       int qfd = bpf_map__fd(obj->maps.cbs_queued);
> +       int efd = bpf_map__fd(obj->maps.cbs_executed);
> +       const struct ksym *ksym;
> +       FILE *f;
> +       time_t t;
> +       struct tm *tm;
> +       char ts[16], buf[256];
> +
> +       f = fopen("/proc/loadavg", "r");
> +       if (f) {
> +               time(&t);
> +               tm = localtime(&t);
> +               strftime(ts, sizeof(ts), "%H:%M:%S", tm);
> +               memset(buf, 0, sizeof(buf));
> +               n = fread(buf, 1, sizeof(buf), f);
> +               if (n)
> +                       printf("%8s loadavg: %s\n", ts, buf);
> +               fclose(f);
> +       }
> +
> +       printf("%-32s %-6s %-6s\n", "Callback", "Queued", "Executed");
> +
> +       while (1) {
> +               int qcount = 0, ecount = 0;
> +
> +               err = bpf_map_get_next_key(qfd, prev_key, &key);
> +               if (err) {
> +                       if (errno == ENOENT) {
> +                               err = 0;
> +                               break;
> +                       }
> +                       warn("bpf_map_get_next_key failed: %s\n", strerror(errno));
> +                       return err;
> +               }
> +
> +               err = bpf_map_lookup_elem(qfd, &key, &qcount);
> +               if (err) {
> +                       warn("bpf_map_lookup_elem failed: %s\n", strerror(errno));
> +                       return err;
> +               }
> +               prev_key = &key;
> +
> +               bpf_map_lookup_elem(efd, &key, &ecount);
> +
> +               ksym = ksyms__map_addr(ksyms, (unsigned long)key);
> +               printf("%-32s %-6d %-6d\n",
> +                               ksym ? ksym->name : "Unknown",
> +                               qcount, ecount);
> +       }
> +       printf("\n");
> +       prev_key = NULL;
> +       while (1) {
> +               err = bpf_map_get_next_key(qfd, prev_key, &key);
> +               if (err) {
> +                       if (errno == ENOENT) {
> +                               err = 0;
> +                               break;
> +                       }
> +                       warn("bpf_map_get_next_key failed: %s\n", strerror(errno));
> +                       return err;
> +               }
> +               err = bpf_map_delete_elem(qfd, &key);
> +               if (err) {
> +                       if (errno == ENOENT) {
> +                               err = 0;
> +                               continue;
> +                       }
> +                       warn("bpf_map_delete_elem failed: %s\n", strerror(errno));
> +                       return err;
> +               }
> +
> +               bpf_map_delete_elem(efd, &key);
> +               prev_key = &key;
> +       }
> +
> +       return err;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +       LIBBPF_OPTS(bpf_object_open_opts, open_opts);
> +       static const struct argp argp = {
> +               .options = opts,
> +               .parser = parse_arg,
> +               .doc = argp_program_doc,
> +       };
> +       struct rcutop_bpf *obj;
> +       int err;
> +       struct syms_cache *syms_cache = NULL;
> +       struct ksyms *ksyms = NULL;
> +
> +       err = argp_parse(&argp, argc, argv, 0, NULL, NULL);
> +       if (err)
> +               return err;
> +
> +       libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
> +       libbpf_set_print(libbpf_print_fn);
> +
> +       err = ensure_core_btf(&open_opts);
> +       if (err) {
> +               fprintf(stderr, "failed to fetch necessary BTF for CO-RE: %s\n", strerror(-err));
> +               return 1;
> +       }
> +
> +       obj = rcutop_bpf__open_opts(&open_opts);
> +       if (!obj) {
> +               warn("failed to open BPF object\n");
> +               return 1;
> +       }
> +
> +       err = rcutop_bpf__load(obj);
> +       if (err) {
> +               warn("failed to load BPF object: %d\n", err);
> +               goto cleanup;
> +       }
> +
> +       err = rcutop_bpf__attach(obj);
> +       if (err) {
> +               warn("failed to attach BPF programs: %d\n", err);
> +               goto cleanup;
> +       }
> +
> +       ksyms = ksyms__load();
> +       if (!ksyms) {
> +               fprintf(stderr, "failed to load kallsyms\n");
> +               goto cleanup;
> +       }
> +
> +       syms_cache = syms_cache__new(0);
> +       if (!syms_cache) {
> +               fprintf(stderr, "failed to create syms_cache\n");
> +               goto cleanup;
> +       }
> +
> +       if (signal(SIGINT, sig_int) == SIG_ERR) {
> +               warn("can't set signal handler: %s\n", strerror(errno));
> +               err = 1;
> +               goto cleanup;
> +       }
> +
> +       while (1) {
> +               sleep(interval);
> +
> +               if (clear_screen) {
> +                       err = system("clear");
> +                       if (err)
> +                               goto cleanup;
> +               }
> +
> +               err = print_stat(ksyms, syms_cache, obj);
> +               if (err)
> +                       goto cleanup;
> +
> +               count--;
> +               if (exiting || !count)
> +                       goto cleanup;
> +       }
> +
> +cleanup:
> +       rcutop_bpf__destroy(obj);
> +       cleanup_core_btf(&open_opts);
> +
> +       return err != 0;
> +}
> diff --git a/libbpf-tools/rcutop.h b/libbpf-tools/rcutop.h
> new file mode 100644
> index 00000000..cb2a3557
> --- /dev/null
> +++ b/libbpf-tools/rcutop.h
> @@ -0,0 +1,8 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +#ifndef __RCUTOP_H
> +#define __RCUTOP_H
> +
> +#define PATH_MAX       4096
> +#define TASK_COMM_LEN  16
> +
> +#endif /* __RCUTOP_H */
> --
> 2.36.0.550.gb090851708-goog
>



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux