On 10/16/19 3:08 PM, Daniel Borkmann wrote: > On 10/16/19 11:28 PM, Alexei Starovoitov wrote: >> On Wed, Oct 16, 2019 at 2:22 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> >> wrote: >>> On 10/16/19 5:25 AM, Alexei Starovoitov wrote: >>>> libbpf analyzes bpf C program, searches in-kernel BTF for given type >>>> name >>>> and stores it into expected_attach_type. >>>> The kernel verifier expects this btf_id to point to something like: >>>> typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, >>>> void *loc); >>>> which represents signature of raw_tracepoint "kfree_skb". >>>> >>>> Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb' >>>> and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint. >>>> In first case it passes btf_id of 'struct sk_buff *' back to the >>>> verifier core >>>> and 'void *' in second case. >>>> >>>> Then the verifier tracks PTR_TO_BTF_ID as any other pointer type. >>>> Like PTR_TO_SOCKET points to 'struct bpf_sock', >>>> PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on. >>>> PTR_TO_BTF_ID points to in-kernel structs. >>>> If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF >>>> then PTR_TO_BTF_ID#1234 points to one of in kernel skbs. >>>> >>>> When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32) >>>> the btf_struct_access() checks which field of 'struct sk_buff' is >>>> at offset 32. Checks that size of access matches type definition >>>> of the field and continues to track the dereferenced type. >>>> If that field was a pointer to 'struct net_device' the r2's type >>>> will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device' >>>> in vmlinux's BTF. >>>> >>>> Such verifier analysis prevents "cheating" in BPF C program. >>>> The program cannot cast arbitrary pointer to 'struct sk_buff *' >>>> and access it. C compiler would allow type cast, of course, >>>> but the verifier will notice type mismatch based on BPF assembly >>>> and in-kernel BTF. >>>> >>>> Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> >>> >>> Overall set looks great! >>> >>> [...] >>>> +int btf_struct_access(struct bpf_verifier_log *log, >>>> + const struct btf_type *t, int off, int size, >>>> + enum bpf_access_type atype, >>>> + u32 *next_btf_id) >>>> +{ >>>> + const struct btf_member *member; >>>> + const struct btf_type *mtype; >>>> + const char *tname, *mname; >>>> + int i, moff = 0, msize; >>>> + >>>> +again: >>>> + tname = __btf_name_by_offset(btf_vmlinux, t->name_off); >>> >>> More of a high-level question wrt btf_ctx_access(), is there a reason >>> the ctx >>> access is only done for raw_tp? I presume kprobes is still on todo >>> (?), what >>> about uprobes which also have pt_regs and could benefit from this >>> work, but is >>> not fixed to btf_vmlinux to search its ctx type. >> >> Optimized kprobes via ftrace entry point are on immediate todo list >> to follow up. I'm still debating on the best way to handle it. >> uprobes - I haven't though about. Likely necessary as well. >> Not sure what types to give to pt_regs yet. >> >>> I presume BPF_LDX | BPF_PROBE_MEM | BPF_* would need no additional >>> encoding, >>> but JIT emission would have to differ depending on the prog type. >> >> you mean for kprobes/uprobes? Why would it need to be different? >> The idea was to keep LDX|PROBE_MEM as normal LDX|MEM load as much as >> possible. > > Agree, makes sense. > >> The only difference vs normal load is to populate extable which is >> arch dependent. > > Wouldn't you also need to switch to USER_DS similarly to what > probe_kernel_read() > vs probe_user_read() differentiates? No. I don't think we should. Here we're reading only kernel memory and shouldn't be messing with addr_limit. No stac/clac and no access_ok() either. It's kernel memory being read. set_fs(KERNEL_DS) matters when access_ok() and getuser() are used by callee that normally take user address while caller is passing kernel address. Here is no such thing.