Re: [PATCH bpf-next v4 01/16] bpf: implement lookup-free direct value access for maps

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Sat, 6 Apr 2019 19:57:45 -0700

On Sun, Apr 07, 2019 at 01:55:14AM +0200, Daniel Borkmann wrote:
> On 04/06/2019 06:54 PM, Alexei Starovoitov wrote:
> > On Sat, Apr 06, 2019 at 12:58:23PM +0200, Daniel Borkmann wrote:
> >> On 04/06/2019 03:56 AM, Alexei Starovoitov wrote:
> >>> On Fri, Apr 05, 2019 at 10:59:27PM +0200, Daniel Borkmann wrote:
> >>>>  
> >>>> -/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
> >>>> +/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> >>>> + * two extensions:
> >>>> + *
> >>>> + * insn[0].src_reg:  BPF_PSEUDO_MAP_FD   BPF_PSEUDO_MAP_VALUE
> >>>> + * insn[0].imm:      map fd              map fd
> >>>> + * insn[1].imm:      0                   offset into value
> >>>> + * insn[0].off:      0                   lower 16 bit of map index
> >>>> + * insn[1].off:      0                   higher 16 bit of map index
> >>>> + * ldimm64 rewrite:  address of map      address of map[index]+offset
> >>>> + * verifier type:    CONST_PTR_TO_MAP    PTR_TO_MAP_VALUE
> >>> ...
> >>>> +	else if (insn->src_reg == BPF_PSEUDO_MAP_VALUE)
> >>>> +		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
> >>>> +			 "map[id:%u][%u]+%u", insn->imm,
> >>>> +			 ((__u32)(__u16)insn[0].off) |
> >>>> +			 ((__u32)(__u16)insn[1].off) << 16,
> >>>> +			 (insn + 1)->imm);
> >>>
> >>> Hopefully one last nit...
> >>> Do we really need to allow this odd split index support?
> >>> Later patches enforce array of 1 element and libbpf only uses that.
> >>> This index feature feels too quirky and not really useful at this moment.
> >>> Can we enforce that insn[0|1].off == 0 instead ?
> >>> Later we can extend it to mean index without breaking anything.
> >>
> >> I originally didn't have it in v2 of the series, but I ended up
> >> implementing it after feedback from Andrii back then complaining
> >> that it's too specific and not generic enough. I agreed with him
> >> that the limitation of max_elems = 1 wasn't too nice, so I went
> >> to implement that full 32 bit index can be used thus that it has
> >> the potential of efficient map lookup replacement for array maps in
> >> general which is quite nice since within single insn it allows to
> >> select index and offset into value all as simple 64 bit imm load.
> > 
> > I missed this discussion.
> > It sort-of sounds nice from kernel side, but how one can use it
> > from bpf program written in C ?
> > If it's assembler only feature, I'd rather not do it.
> > statc int ar1[N];
> > and 
> > static struct S { ...} ar2[M];
> > will still be normal map of 1 element from llvm side.
> > Right now there is no support for variable length access
> > into static vars. When it's added in the future the ar[var] will be
> > some base offset into map of 1 element plus register addition.
> > So no opportunity to use 'index'.
> > bpf_map_lookup_elem(map, &key); from program side passes
> > a pointer and it's a function call. Even if key is constant
> > register spill/fill due to function call caused perf loss,
> > so extra 'index' optimization won't buy much.
> > We can introduce some special intrinsic/builtin to support
> > this 'index' from C, but it's not pretty.
> > So far I couldn't come up with C example that can use such 'index' feature.
> 
> Wrt 'index' was mostly thinking into llvm-builtin direction for how
> it could be consumed from C aside from loaders. One example for complex
> networking programs that comes to mind right away would be mib-style error
> counters similarly we have in the stack where the counter acts as index for
> the direct lookup. 

how would such builtin look? I still don't see it fiting into C code.

> Another option [for loaders] could be (given the sections
> are for the whole object) to allow an option for some of the programs in
> a given object to have private .bss/.data/.rodata sections by allocating
> max_elems > 1 and selecting the target buffer via index given nothing
> changes in terms of size or vars. I still think it's a useful extension
> to carry and keeps the direct value access a generic implementation.

I don't get this shadow vs normal .data idea.
The more we talk the more I'm convinced that this is not a good api.
Say in the future we indeed have these shadow + normal .data
then just use the same insn->imm field to refer to shadow part.
Even if there are N such regions. The value_size is known.
So use 0<=imm<value_size to refer to 'index' 0 and
value_size<=imm<value_size*2 to refer to 'index' 1.
There is absolutely no need for offset and index to be separate.
Address of a byte inside bpf array can be expressed with single integer.