On 11/20/22 6:10 PM, John Fastabend wrote:
Yonghong Song wrote:
Currenty, a non-tracing bpf program typically has a single 'context' argument
with predefined uapi struct type. Following these uapi struct, user is able
to access other fields defined in uapi header. Inside the kernel, the
user-seen 'context' argument is replaced with 'kernel context' (or 'kctx'
in short) which can access more information than what uapi header provides.
To access other info not in uapi header, people typically do two things:
(1). extend uapi to access more fields rooted from 'context'.
(2). use bpf_probe_read_kernl() helper to read particular field based on
kctx.
Using (1) needs uapi change and using (2) makes code more complex since
direct memory access is not allowed.
There are already a few instances trying to access more information from
kctx:
. trying to access some fields from perf_event kctx ([1]).
. trying to access some fields from xdp kctx ([2]).
This patch set tried to allow direct memory access for kctx fields
by introducing bpf_cast_to_kern_ctx() kfunc.
Martin mentioned a use case like type casting below:
#define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB)))
basically a 'unsigned char *" casted to 'struct skb_shared_info *'. This patch
set tries to support such a use case as well with bpf_rdonly_cast().
For the patch series, Patch 1 added support for a kfunc available to all
prog types. Patch 2 added bpf_cast_to_kern_ctx() kfunc. Patch 3 added
bpf_rdonly_cast() kfunc. Patch 4 added a few positive and negative tests.
[1] https://lore.kernel.org/bpf/ad15b398-9069-4a0e-48cb-4bb651ec3088@xxxxxxxx/
[2] https://lore.kernel.org/bpf/20221109215242.1279993-1-john.fastabend@xxxxxxxxx/
Changelog:
v3 -> v4:
- remove unnecessary bpf_ctx_convert.t error checking
- add and use meta.ret_btf_id instead of meta.arg_constant.value for
bpf_cast_to_kern_ctx().
- add PTR_TRUSTED to the return PTR_TO_BTF_ID type for bpf_cast_to_kern_ctx().
v2 -> v3:
- rebase on top of bpf-next (for merging conflicts)
- add the selftest to s390x deny list
rfcv1 -> v2:
- break original one kfunc into two.
- add missing error checks and error logs.
- adapt to the new conventions in
https://lore.kernel.org/all/20221118015614.2013203-1-memxor@xxxxxxxxx/
for example, with __ign and __k suffix.
- added support in fixup_kfunc_call() to replace kfunc calls with a single mov.
Yonghong Song (4):
bpf: Add support for kfunc set with common btf_ids
bpf: Add a kfunc to type cast from bpf uapi ctx to kernel ctx
bpf: Add a kfunc for generic type cast
bpf: Add type cast unit tests
Thanks Yonghong! Ack for the series for me, but looks like Alexei is
quick.
From myside this allows us to pull in the dev info and from that get
netns so fixes a gap we had to split into a kprobe + xdp.
If we can get a pointer to the recv queue then with a few reads we
get the hash, vlan, etc. (see timestapm thread)
Thanks, John. Glad to see it is useful.
And then last bit is if we can get a ptr to the net ns list, plus
Unfortunately, currently vmlinux btf does not have non-percpu global
variables, so net_namespace_list is not available to bpf programs.
But I think we could do the following with a little bit user space
initial involvement as a workaround.
In bpf program, we could have global variable
__u64 net_namespace_list;
and user space can lookup /proc/kallsyms for net_namespace_list
and assign it to bpf program 'net_namespace_list' before prog load.
After that, you could implement an in-bpf-prog iterator with bounded
loop to ensure eventual ending. You can use
struct list_head *lh = bpf_rdonly_cast(net_namespace_list,
struct_list_head_btf_id)
cast to struct list_head pointer. From there you can tracing down
the list with needed bpf_rdonly_cast() for casting to element type.
the rcu patch we can build the net ns iterator directly in BPF
I just posted rcu patch
https://lore.kernel.org/bpf/20221121170515.1193967-1-yhs@xxxxxx/
Please help take a look whether it can serve your need.
which seems stronger than an iterator IMO because we can kick it
off on events anywhere in the kernel. Or based on event kick of
some specific iterator e.g. walk net_devs in netns X with SR-IOV
interfaces). Ideally we would also wire it up to timers so we
can call it every N seconds without any user space intervention.
Eventually, its nice if the user space can crash, restart, and
so on without impacting the logic in kernel.
Thanks again.