On Thu, Mar 24, 2022 at 10:30 PM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote: > + > +struct __bpf_usdt_arg_spec { > + __u64 val_off; > + enum __bpf_usdt_arg_type arg_type; > + short reg_off; > + bool arg_signed; > + char arg_bitshift; > +}; > + > +/* should match USDT_MAX_ARG_CNT in usdt.c exactly */ > +#define BPF_USDT_MAX_ARG_CNT 12 > +struct __bpf_usdt_spec { > + struct __bpf_usdt_arg_spec args[BPF_USDT_MAX_ARG_CNT]; > + __u64 usdt_cookie; > + short arg_cnt; > +}; > + > +__weak struct { > + __uint(type, BPF_MAP_TYPE_ARRAY); > + __uint(max_entries, BPF_USDT_MAX_SPEC_CNT); > + __type(key, int); > + __type(value, struct __bpf_usdt_spec); > +} __bpf_usdt_specs SEC(".maps"); > + > +__weak struct { > + __uint(type, BPF_MAP_TYPE_HASH); > + __uint(max_entries, BPF_USDT_MAX_IP_CNT); > + __type(key, long); > + __type(value, struct __bpf_usdt_spec); > +} __bpf_usdt_specs_ip_to_id SEC(".maps"); ... > + > +/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res. > + * Returns 0 on success; negative error, otherwise. > + * On error *res is guaranteed to be set to zero. > + */ > +__hidden __weak > +int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res) > +{ > + struct __bpf_usdt_spec *spec; > + struct __bpf_usdt_arg_spec *arg_spec; > + unsigned long val; > + int err, spec_id; > + > + *res = 0; > + > + spec_id = __bpf_usdt_spec_id(ctx); > + if (spec_id < 0) > + return -ESRCH; > + > + spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id); > + if (!spec) > + return -ESRCH; > + > + if (arg >= spec->arg_cnt) > + return -ENOENT; > + > + arg_spec = &spec->args[arg]; > + switch (arg_spec->arg_type) { Without bpf_cookie in the kernel each arg access is two lookups. With bpf_cookie it's a single lookup in an array that is fast. Multiply that cost by number of args. Not a huge cost, but we can do better long term. How about annotating bpf_cookie with PTR_TO_BTF_ID at prog load time. So that bpf_get_attach_cookie() returns PTR_TO_BTF_ID instead of long. This way bpf_get_attach_cookie() can return "struct __bpf_usdt_spec *". At attach time libbpf will provide populated 'struct __bpf_usdt_spec' to the kernel and the kernel will copy the struct's data in the bpf_link. At detach time that memory is freed. Advantages: - saves an array lookup at runtime - no need to provide size for __bpf_usdt_specs map. That map is no longer needed. users don't need to worry about maxing out BPF_USDT_MAX_SPEC_CNT. - libbpf doesn't need to populate __bpf_usdt_specs map libbpf doesn't need to allocate spec_id-s. libbpf will keep struct __bpf_usdt_spec per uprobe and pass it to the kernel at attach time to store in bpf_link. "cookie as ptr_to_btf_id" is a generic mechanism to provide a blob of data to the bpf prog instead of a single "long". That blob can be read/write too. It can be used as per-program + per-attach point scratch area. Similar to task/inode local storage... That would be (prog, attach_point) local storage. Thoughts?