On 11/22/22 9:29 PM, Yonghong Song wrote:
From: Martin KaFai Lau <martin.lau@xxxxxxxxx>
Date: Tuesday, November 22, 2022 at 5:53 PM
To: Yonghong Song <yhs@xxxxxxxx>
Cc: Alexei Starovoitov <ast@xxxxxxxxxx>, Andrii Nakryiko <andrii@xxxxxxxxxx>, Daniel Borkmann <daniel@xxxxxxxxxxxxx>, Kernel Team <kernel-team@xxxxxxxx>, Martin KaFai Lau <martin.lau@xxxxxxxxxx>, bpf@xxxxxxxxxxxxxxx <bpf@xxxxxxxxxxxxxxx>
Subject: Re: [PATCH bpf-next v8 4/4] selftests/bpf: Add tests for bpf_rcu_read_lock()
On 11/22/22 5:39 PM, Martin KaFai Lau wrote:
On 11/22/22 5:13 PM, Yonghong Song wrote:
On 11/22/22 4:56 PM, Martin KaFai Lau wrote:
On 11/22/22 11:53 AM, Yonghong Song wrote:
+SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
+int task_acquire(void *ctx)
+{
+ struct task_struct *task, *real_parent;
+
+ task = bpf_get_current_task_btf();
+ bpf_rcu_read_lock();
+ real_parent = task->real_parent;
+ /* acquire a reference which can be used outside rcu read lock region */
+ real_parent = bpf_task_acquire(real_parent);
Does the bpf_task_acquire() kfunc need a change to do refcount_inc_not_zero()
and KF_RET_NULL?
We have this definition in kernel:
BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
So the argument is trusted args so, either marked as PTR_TRUSTED/MEM_ALLOC or
have a reference acquired already, so
I guess we should be fine here.
The verifier part is fine on {KF_TRUSTED_ARGS, PTR_TRUSTED}.
iiuc, PTR_TRUSTED means the kfunc can safely dereference the pointer because the
ptr has not been freed yet but does not mean its refcnt > 0 and not on its way
to be freed after the rcu gp.
If real_parent's refcnt is 0 here, bpf_task_acquire() will resurrect a task
which is on its way to be freed and the task can be stored in a map, so a UAF.
I see. Maybe we need strong trusted vs. weak trusted variants. Strong trusted means refcnt > 0 and weak means no guarantee? Or we consider everything as week and tries to grab a reference anyway? In most if not all cases, ‘current’ should represent a strong trusted btf_id I guess.
yeah, "current" task here is fine. current->real_parent is questionable.
imo, I think this check may be better done in runtime. The bpf_*_acquire() kfunc
should always do refcount_inc_not_zero() + KF_RET_NULL. Otherwise, it may end up
requiring to tag which ctx has a zero/non-zero refcnt. eg. the
security_sk_alloc() hook, the sk's refcnt is 0 and later the kernel does a
refcount_set(&sk->sk_refcnt, 1).
This could be addressed as a follow up though since it is not specific to this set.
Right, we have the same potential problem for both task and cgroup acquire functions.