On 5/16/22 7:26 PM, Andrii Nakryiko wrote: > On Thu, May 12, 2022 at 12:43 AM Dave Marchevsky <davemarchevsky@xxxxxx> wrote: >> >> Handle xmm0,...,xmm15 registers when parsing USDT arguments. Currently >> only the first 64 bits of the fetched value are returned as I haven't >> seen the rest of the register used in practice. >> >> This patch also handles floats in USDT arg spec by ignoring the fact >> that they're floats and considering them scalar. Currently we can't do >> float math in BPF programs anyways, so might as well support passing to >> userspace and converting there. >> >> We can use existing ARG_REG sscanf + logic, adding XMM-specific logic >> when calc_pt_regs_off fails. If the reg is xmm, arg_spec's reg_off is >> repurposed to hold reg_no, which is passed to bpf_get_reg_val. Since the >> helper does the digging around in fxregs_state it's not necessary to >> calculate offset in bpf code for these regs. >> >> NOTE: Changes here cause verification failure for existing USDT tests. >> Specifically, BPF_USDT prog 'usdt12' fails to verify due to too many >> insns despite not having its insn count significantly changed. >> >> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx> >> --- >> tools/lib/bpf/usdt.bpf.h | 36 ++++++++++++++++++++-------- >> tools/lib/bpf/usdt.c | 51 ++++++++++++++++++++++++++++++++++++---- >> 2 files changed, 73 insertions(+), 14 deletions(-) >> >> diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h >> index 4181fddb3687..7b5ed4cbaa2f 100644 >> --- a/tools/lib/bpf/usdt.bpf.h >> +++ b/tools/lib/bpf/usdt.bpf.h >> @@ -43,6 +43,7 @@ enum __bpf_usdt_arg_type { >> BPF_USDT_ARG_CONST, >> BPF_USDT_ARG_REG, >> BPF_USDT_ARG_REG_DEREF, >> + BPF_USDT_ARG_XMM_REG, >> }; >> >> struct __bpf_usdt_arg_spec { >> @@ -129,7 +130,9 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res) >> { >> struct __bpf_usdt_spec *spec; >> struct __bpf_usdt_arg_spec *arg_spec; >> - unsigned long val; >> + struct pt_regs *btf_regs; >> + struct task_struct *btf_task; >> + struct { __u64 a; __u64 unused; } val = {}; >> int err, spec_id; >> >> *res = 0; >> @@ -151,7 +154,7 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res) >> /* Arg is just a constant ("-4@$-9" in USDT arg spec). >> * value is recorded in arg_spec->val_off directly. >> */ >> - val = arg_spec->val_off; >> + val.a = arg_spec->val_off; >> break; >> case BPF_USDT_ARG_REG: >> /* Arg is in a register (e.g, "8@%rax" in USDT arg spec), >> @@ -159,7 +162,20 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res) >> * struct pt_regs. To keep things simple user-space parts >> * record offsetof(struct pt_regs, <regname>) in arg_spec->reg_off. >> */ >> - err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off); >> + err = bpf_probe_read_kernel(&val.a, sizeof(val.a), (void *)ctx + arg_spec->reg_off); >> + if (err) >> + return err; >> + break; >> + case BPF_USDT_ARG_XMM_REG: > > nit: a bit too XMM-specific name here, we probably want to keep it a bit Agreed. > >> + /* Same as above, but arg is an xmm reg, so can't look >> + * in pt_regs, need to use special helper. >> + * reg_off is the regno ("xmm0" -> regno 0, etc) >> + */ >> + btf_task = bpf_get_current_task_btf(); >> + btf_regs = (struct pt_regs *)bpf_task_pt_regs(btf_task); > > I'd like to avoid taking dependency on bpf_get_current_task_btf() for > rare case of XMM register, which makes it impossible to do USDT on > older kernels. It seems like supporting reading registers from current > (and maybe current pt_regs context) should cover a lot of practical > uses. > Yep. We talked about this today. Will remove. >> + err = bpf_get_reg_val(&val, sizeof(val), > > But regardless of the above, we'll need to use CO-RE to detect support > for this new BPF helper (probably using bpf_core_enum_value_exists()?) > to allow using USDTs on older kernels. > Will add. > >> + ((u64)arg_spec->reg_off + BPF_GETREG_X86_XMM0) << 32, >> + btf_regs, btf_task); >> if (err) >> return err; >> break; >> @@ -171,14 +187,14 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res) >> * from pt_regs, then do another user-space probe read to >> * fetch argument value itself. >> */ >> - err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off); >> + err = bpf_probe_read_kernel(&val.a, sizeof(val.a), (void *)ctx + arg_spec->reg_off); >> if (err) >> return err; >> - err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off); >> + err = bpf_probe_read_user(&val.a, sizeof(val.a), (void *)val.a + arg_spec->val_off); > > is the useful value in xmm register normally in lower 64-bits of it? > is it possible to just request reading just the first 64 bits from > bpf_get_reg_val() and avoid this ugly union? For USDT usecase, I've only seen lower 64 bits used. Should be possible to just grab those, will see if there's a clean way to integrate such an option. > >> if (err) >> return err; >> #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ >> - val >>= arg_spec->arg_bitshift; >> + val.a >>= arg_spec->arg_bitshift; >> #endif >> break; >> default: > > [...] > >> +static int calc_xmm_regno(const char *reg_name) >> +{ >> + static struct { >> + const char *name; >> + __u16 regno; >> + } xmm_reg_map[] = { >> + { "xmm0", 0 }, >> + { "xmm1", 1 }, >> + { "xmm2", 2 }, >> + { "xmm3", 3 }, >> + { "xmm4", 4 }, >> + { "xmm5", 5 }, >> + { "xmm6", 6 }, >> + { "xmm7", 7 }, >> +#ifdef __x86_64__ >> + { "xmm8", 8 }, >> + { "xmm9", 9 }, >> + { "xmm10", 10 }, >> + { "xmm11", 11 }, >> + { "xmm12", 12 }, >> + { "xmm13", 13 }, >> + { "xmm14", 14 }, >> + { "xmm15", 15 }, > > no-x86 arches parse this generically with sscanf(), seems like we can > do this simple approach here as well? > Agreed. > >> +#endif >> + }; >> + int i; >> + >> + for (i = 0; i < ARRAY_SIZE(xmm_reg_map); i++) { >> + if (strcmp(reg_name, xmm_reg_map[i].name) == 0) >> + return xmm_reg_map[i].regno; >> + } >> + >> return -ENOENT; >> } >> > > [...]