On Wed, 2 Nov 2022 at 06:02, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Wed, Nov 02, 2022 at 04:01:11AM +0530, Kumar Kartikeya Dwivedi wrote: > > On Wed, 2 Nov 2022 at 03:06, David Vernet <void@xxxxxxxxxxxxx> wrote: > > > > > > On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote: > > > > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <void@xxxxxxxxxxxxx> wrote: > > > > > > > > > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ? > > > > > > We've introduced / vs nf_conf specifically to express the relationship > > > > > > between allocated nf_conn and other nf_conn-s via different types. > > > > > > Why is this not enough? > > > > > > > > > > Kumar should have more context here (he originally suggested this in > > > > > [0]), > > > > > > > > Quoting: > > > > " > > > > Unfortunately a side effect of this change is that now since > > > > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_* > > > > functions would begin working with tp_btf args. > > > > " > > > > I couldn't find any tracepoint that has nf_conn___init as an argument. > > > > The whole point of that new type was to return it to bpf prog, > > > > so the verifier type matches it when it's passed into bpf_ct_* > > > > in turn. > > > > So I don't see a need for a new OWNED flag still. > > > > If nf_conn___init is passed into tracepoint it's a bug and > > > > we gotta fix it. > > > > > > Yep, this is what I'm seeing as well. I think you're right that > > > KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper > > > types is the way to enable an ownership model like this. > > > > > > > It's not just nf_conn___init. Some CT helpers also take nf_conn. > > e.g. bpf_ct_change_timeout, bpf_ct_change_status. > > Right now they are only allowed in XDP and TC programs, so the tracing > > args part is not a problem _right now_. > > ... and it will be fine to use bpf_ct_change_timeout from tp_btf as well. > > > So currently it may not be possible to pass such a trusted but > > ref_obj_id == 0 nf_conn to those helpers. > > But based on changes unrelated to this, it may become possible in the > > future to obtain such a trusted nf_conn pointer. > > From kfunc's pov trusted pointer means valid pointer. > It doesn't need to be ref_obj_id refcounted from the verifier pov. > It can be refcounted on the kernel side and it will be trusted. > The code that calls trace_*() passes only trusted pointers into tp-s. > If there is a tracepoint somewhere in the kernel that uses a volatile > pointer to potentially uaf kernel object it's a bug that should be fixed. > This is all fine. I'm asking you to distinguish between trusted-not-refcounted and trusted-and-refcounted. It is necessary for nf_conn, since the object can be reused if the refcount is not held. Some other CPU could be reusing the same memory and allocating a new nf_conn on it while we change its status. So it's not ok to call bpf_ct_change_timeout/status on trusted nf_conn, but only on trusted+refcounted nf_conn. Trusted doesn't capture the difference between 'valid' vs 'valid and owned by prog' anymore with the new definition for PTR_TO_BTF_ID. Yes, in most cases the tracepoints/tracing functions whitelisted will have the caller ensure that, but we should then allow trusted nf_conn in those hooks explicitly, not implicitly by default everywhere. Until then it should be restricted to ref_obj_id > 0 IMO as it is right now. > > It is a requirement of those kfuncs that the nf_conn has its refcount > > held while they are called. > > and it will be. Just not by the verifier. > > > KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore. > > It seems better to me to keep that restriction instead of relaxing it, > > if it is part of the contract. > > Disagree as explained above. > > > It is fine to not require people to dive into these details and just > > use KF_TRUSTED_ARGS in general, but we need something to cover special > > cases like these where the object is only stable while we hold an > > active refcount, RCU protection is not enough against reuse. > > This is not related to RCU. Let's not mix RCU concerns in here. > It's a different topic. > What I meant is that in the normal case, usually objects aren't reused while the RCU read lock is held. In case of nf_conn, the refcount needs to be held to ensure that, since it uses SLAB_TYPESAFE_BY_RCU. This is why bpf_ct_lookup needs to bump the refcount and match the key after that again, and cannot just return the looked up ct directly. > > It could be 'expert only' __ref suffix on the nf_conn arg, or > > KF_OWNED_ARGS, or something else. > > I'm still against that. > I understand (and agree) that you don't want to complicate things further. It's fine if you want to deal with this later when the above concern materializes. But it will be yet another thing to keep in mind for the future.