Re: [PATCH bpf] bpf: Fix UAF in task local storage

KP Singh <kpsingh@xxxxxxxxxx> · Thu, 1 Jun 2023 20:24:40 +0200

On Thu, Jun 1, 2023 at 7:47 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 6/1/23 9:54 AM, Song Liu wrote:
> >
> >
> >> On Jun 1, 2023, at 9:27 AM, KP Singh <kpsingh@xxxxxxxxxx> wrote:
> >>
> >> On Thu, Jun 1, 2023 at 6:18 PM Song Liu <song@xxxxxxxxxx> wrote:
> >>>
> >>> On Thu, Jun 1, 2023 at 5:26 AM KP Singh <kpsingh@xxxxxxxxxx> wrote:
> >>>>
> >>>> When the the task local storage was generalized for tracing programs, the
> >>>> bpf_task_local_storage callback was moved from a BPF LSM hook callback
> >>>> for security_task_free LSM hook to it's own callback. But a failure case
> >>>> in bad_fork_cleanup_security was missed which, when triggered, led to a dangling
> >>>> task owner pointer and a subsequent use-after-free.
> >>>>
> >>>> This issue was noticed when a BPF LSM program was attached to the
> >>>> task_alloc hook on a kernel with KASAN enabled. The program used
> >>>> bpf_task_storage_get to copy the task local storage from the current
> >>>> task to the new task being created.
> >>>
> >>> This is pretty tricky. Let's add a selftest for this.
> >>
> >> I don't have an easy repro for this (the UAF does not trigger
> >> immediately), Also I am not sure how one would test a UAF in a
> >> selftest. What actually happens is:
> >>
> >> * We have a dangling task pointer in local storage.
> >> * This is used sometime later which then leads to weird memory
> >> corruption errors.
> >
> > I think we will see it easily with KASAN, no?

No, the issue only happens when copy_process fails for some reason
(which one can possibly trigger with error injection / fexit) and then
somehow triggers the UAF.

Even if one does manage to trigger the KASAN warning, we won't fail
the selftest, so I don't see this in the selftest territory TBH. What
do you have in mind?

> >
> >>
> >>>
> >>>>
> >>>> Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs")
> >>>> Reported-by: Kuba Piecuch <jpiecuch@xxxxxxxxxx>
> >>>> Signed-off-by: KP Singh <kpsingh@xxxxxxxxxx>
> >>>> ---
> >>>>
> >>>> This fixes the regression from the LSM blob based implementation, we can
> >>>> still have UAFs, if bpf_task_storage_get is invoked after bpf_task_storage_free
> >>>> in the cleanup path.
> >>>
> >>> Can we fix this by calling bpf_task_storage_free() from free_task()?
> >>
> >> I think we can yeah. But, this is yet another deviation from how the
> >> security blob is managed (security_task_free) frees the blob that we
> >> were previously using.
>
> Does it mean doing bpf_task_storage_free() in free_task() will break some use
> cases? Could you explain?
> Doing bpf_task_storage_free() in free_task() seems to be more straight forward.

Superficially, I don't see any issues . All I am saying is that,
before we generalized task local storage, it was allocated and freed
as a security blob and now it's deviating further. Should we just
consider moving security_task_free into task_free then?

>
> >
> > Yeah, this will make the code even more tricky.
> >
> > Another idea I had is to filter on task->__state in the helper. IOW,

bailing out on __state == TASK_DEAD should be reasonable.

> > task local storage does not work on starting or died tasks. But I am
> > not sure whether this will make BPF_LSM less effective (not covering
> > certain tasks).

As long as the task local storage is usable in LSM hooks like
security_task_alloc it's okay

> >
> > Thanks,
> > Song
> >
> >
>