Re: [PATCH bpf-next] RFC: libbpf: resolve rodata lookups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 20, 2022 at 11:04:59AM -0700, sdf@xxxxxxxxxx wrote:
> On 07/19, sdf@xxxxxxxxxx wrote:
> > On 07/19, Alexei Starovoitov wrote:
> > > On Tue, Jul 19, 2022 at 2:41 PM Stanislav Fomichev <sdf@xxxxxxxxxx>
> > wrote:
> > > >
> > > > On Tue, Jul 19, 2022 at 1:33 PM Stanislav Fomichev <sdf@xxxxxxxxxx>
> > > wrote:
> > > > >
> > > > > On Tue, Jul 19, 2022 at 1:21 PM Alexei Starovoitov
> > > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Mon, Jul 18, 2022 at 12:07 PM Stanislav Fomichev
> > > <sdf@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > Motivation:
> > > > > > >
> > > > > > > Our bpf programs have a bunch of options which are set at the
> > > loading
> > > > > > > time. After loading, they don't change. We currently use array
> > map
> > > > > > > to store them and bpf program does the following:
> > > > > > >
> > > > > > > val = bpf_map_lookup_elem(&config_map, &key);
> > > > > > > if (likely(val && *val)) {
> > > > > > >   // do some optional feature
> > > > > > > }
> > > > > > >
> > > > > > > Since the configuration is static and we have a lot of those
> > > features,
> > > > > > > I feel like we're wasting precious cycles doing dynamic lookups
> > > > > > > (and stalling on memory loads).
> > > > > > >
> > > > > > > I was assuming that converting those to some fake kconfig
> > options
> > > > > > > would solve it, but it still seems like kconfig is stored in the
> > > > > > > global map and kconfig entries are resolved dynamically.
> > > > > > >
> > > > > > > Proposal:
> > > > > > >
> > > > > > > Resolve kconfig options statically upon loading. Basically
> > rewrite
> > > > > > > ld+ldx to two nops and 'mov val, x'.
> > > > > > >
> > > > > > > I'm also trying to rewrite conditional jump when the condition
> > is
> > > > > > > !imm. This seems to be catching all the cases in my program, but
> > > > > > > it's probably too hacky.
> > > > > > >
> > > > > > > I've attached very raw RFC patch to demonstrate the idea.
> > Anything
> > > > > > > I'm missing? Any potential problems with this approach?
> > > > > >
> > > > > > Have you considered using global variables for that?
> > > > > > With skeleton the user space has a natural way to set
> > > > > > all of these knobs after doing skel_open and before skel_load.
> > > > > > Then the verifier sees them as readonly vars and
> > > > > > automatically converts LDX into fixed constants and if the code
> > > > > > looks like if (my_config_var) then the verifier will remove
> > > > > > all the dead code too.
> > > > >
> > > > > Hm, that's a good alternative, let me try it out. Thanks!
> > > >
> > > > Turns out we already freeze kconfig map in libbpf:
> > > > if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG) {
> > > >         err = bpf_map_freeze(map->fd);
> > > >
> > > > And I've verified that I do hit bpf_map_direct_read in the verifier.
> > > >
> > > > But the code still stays the same (bpftool dump xlated):
> > > >   72: (18) r1 = map[id:24][0]+20
> > > >   74: (61) r1 = *(u32 *)(r1 +0)
> > > >   75: (bf) r2 = r9
> > > >   76: (b7) r0 = 0
> > > >   77: (15) if r1 == 0x0 goto pc+9
> > > >
> > > > I guess there is nothing for sanitize_dead_code to do because my
> > > > conditional is "if (likely(some_condition)) { do something }" and the
> > > > branch instruction itself is '.seen' by the verifier.
> 
> > > I bet your variable is not 'const'.
> > > Please see any of the progs in selftests that do:
> > > const volatile int var = 123;
> > > to express configs.
> 
> > Yeah, I was testing against the following:
> 
> > 	extern int CONFIG_XYZ __kconfig __weak;
> 
> > But ended up writing this small reproducer:
> 
> > 	struct __sk_buff;
> 
> > 	const volatile int CONFIG_DROP = 1; // volatile so it's not
> > 					    // clang-optimized
> 
> > 	__attribute__((section("tc"), used))
> > 	int my_config(struct __sk_buff *skb)
> > 	{
> > 		int ret = 0; /*TC_ACT_OK*/
> 
> > 		if (CONFIG_DROP)
> > 			ret = 2 /*TC_ACT_SHOT*/;
> 
> > 		return ret;
> > 	}
> 
> > $ bpftool map dump name my_confi.rodata
> 
> > [{
> >          "value": {
> >              ".rodata": [{
> >                      "CONFIG_DROP": 1
> >                  }
> >              ]
> >          }
> >      }
> > ]
> 
> > $ bpftool prog dump xlated name my_config
> 
> > int my_config(struct __sk_buff * skb):
> > ; if (CONFIG_DROP)
> >     0: (18) r1 = map[id:3][0]+0
> >     2: (61) r1 = *(u32 *)(r1 +0)
> >     3: (b4) w0 = 1
> > ; if (CONFIG_DROP)
> >     4: (64) w0 <<= 1
> > ; return ret;
> >     5: (95) exit
> 
> > The branch is gone, but the map lookup is still there :-(
> 
> Attached another RFC below which is doing the same but from the verifier
> side. It seems we should be able to resolve LD+LDX if their dst_reg
> is the same? If they are different, we should be able to pre-lookup
> LDX value at least. Would something like this work (haven't run full
> verifier/test_progs yet)?
> 
> (note, in this case, with kconfig, I still see the branch)
> 
>  test_fold_ro_ldx:PASS:open 0 nsec
>  test_fold_ro_ldx:PASS:load 0 nsec
>  test_fold_ro_ldx:PASS:bpf_obj_get_info_by_fd 0 nsec
>  int fold_ro_ldx(struct __sk_buff * skb):
>  ; if (CONFIG_DROP)
>     0: (b7) r1 = 1
>     1: (b4) w0 = 1
>  ; if (CONFIG_DROP)
>     2: (16) if w1 == 0x0 goto pc+1
>     3: (b4) w0 = 2
>  ; return ret;
>     4: (95) exit
>  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>  #66      fold_ro_ldx:OK
> 
> Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
> ---
>  kernel/bpf/verifier.c                         | 74 ++++++++++++++++++-
>  .../selftests/bpf/prog_tests/fold_ro_ldx.c    | 52 +++++++++++++
>  .../testing/selftests/bpf/progs/fold_ro_ldx.c | 20 +++++
>  3 files changed, 144 insertions(+), 2 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/fold_ro_ldx.c
>  create mode 100644 tools/testing/selftests/bpf/progs/fold_ro_ldx.c
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index c59c3df0fea6..ffedd8234288 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12695,6 +12695,69 @@ static bool bpf_map_is_cgroup_storage(struct
> bpf_map *map)
>  		map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
>  }
> 
> +/* if the map is read-only, we can try to fully resolve the load */
> +static bool fold_ro_pseudo_ldimm64(struct bpf_verifier_env *env,
> +				   struct bpf_map *map,
> +				   struct bpf_insn *insn)
> +{
> +	struct bpf_insn *ldx_insn = insn + 2;
> +	int dst_reg = ldx_insn->dst_reg;
> +	u64 val = 0;
> +	int size;
> +	int err;
> +
> +	if (!bpf_map_is_rdonly(map) || !map->ops->map_direct_value_addr)
> +		return false;
> +
> +	/* 0: BPF_LD  r=MAP
> +	 * 1: BPF_LD  r=MAP
> +	 * 2: BPF_LDX r=MAP->VAL
> +	 */
> +
> +	if (BPF_CLASS((insn+0)->code) != BPF_LD ||
> +	    BPF_CLASS((insn+1)->code) != BPF_LD ||
> +	    BPF_CLASS((insn+2)->code) != BPF_LDX)
> +		return false;
> +
> +	if (BPF_MODE((insn+0)->code) != BPF_IMM ||
> +	    BPF_MODE((insn+1)->code) != BPF_IMM ||
> +	    BPF_MODE((insn+2)->code) != BPF_MEM)
> +		return false;
> +
> +	if (insn->src_reg != BPF_PSEUDO_MAP_VALUE &&
> +	    insn->src_reg != BPF_PSEUDO_MAP_IDX_VALUE)
> +		return false;
> +
> +	if (insn->dst_reg != ldx_insn->src_reg)
> +		return false;
> +
> +	if (ldx_insn->off != 0)
> +		return false;
> +
> +	size = bpf_size_to_bytes(BPF_SIZE(ldx_insn->code));
> +	if (size < 0 || size > 4)
> +		return false;
> +
> +	err = bpf_map_direct_read(map, (insn+1)->imm, size, &val);
> +	if (err)
> +		return false;
> +
> +	if (insn->dst_reg == ldx_insn->dst_reg) {
> +		/* LDX is using the same destination register as LD.
> +		 * This means we are not interested in the map
> +		 * pointer itself and can remove it.
> +		 */
> +		*(insn + 0) = BPF_JMP_A(0);
> +		*(insn + 1) = BPF_JMP_A(0);
> +		*(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
Have you figured out why the branch is not removed
with BPF_ALU64_IMM(BPF_MOV) ?

Can it also support 8 bytes (BPF_DW) ?  Is it because there
is not enough space for ld_imm64?  so wonder if this
patching can be done in do_misc_fixups() instead.

> +		return true;
> +	}
> +
> +	*(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
> +	/* Only LDX can be resolved, we still have to resolve LD address. */
> +	return false;
> +}



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux