On Thu, Jun 1, 2023 at 8:24 PM KP Singh <kpsingh@xxxxxxxxxx> wrote: > > On Thu, Jun 1, 2023 at 7:47 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > > > On 6/1/23 9:54 AM, Song Liu wrote: > > > > > > > > >> On Jun 1, 2023, at 9:27 AM, KP Singh <kpsingh@xxxxxxxxxx> wrote: > > >> > > >> On Thu, Jun 1, 2023 at 6:18 PM Song Liu <song@xxxxxxxxxx> wrote: > > >>> > > >>> On Thu, Jun 1, 2023 at 5:26 AM KP Singh <kpsingh@xxxxxxxxxx> wrote: > > >>>> > > >>>> When the the task local storage was generalized for tracing programs, the > > >>>> bpf_task_local_storage callback was moved from a BPF LSM hook callback > > >>>> for security_task_free LSM hook to it's own callback. But a failure case > > >>>> in bad_fork_cleanup_security was missed which, when triggered, led to a dangling > > >>>> task owner pointer and a subsequent use-after-free. > > >>>> > > >>>> This issue was noticed when a BPF LSM program was attached to the > > >>>> task_alloc hook on a kernel with KASAN enabled. The program used > > >>>> bpf_task_storage_get to copy the task local storage from the current > > >>>> task to the new task being created. > > >>> > > >>> This is pretty tricky. Let's add a selftest for this. > > >> > > >> I don't have an easy repro for this (the UAF does not trigger > > >> immediately), Also I am not sure how one would test a UAF in a > > >> selftest. What actually happens is: > > >> > > >> * We have a dangling task pointer in local storage. > > >> * This is used sometime later which then leads to weird memory > > >> corruption errors. > > > > > > I think we will see it easily with KASAN, no? > > No, the issue only happens when copy_process fails for some reason > (which one can possibly trigger with error injection / fexit) and then > somehow triggers the UAF. > > Even if one does manage to trigger the KASAN warning, we won't fail > the selftest, so I don't see this in the selftest territory TBH. What > do you have in mind? > > > > > > >> > > >>> > > >>>> > > >>>> Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") > > >>>> Reported-by: Kuba Piecuch <jpiecuch@xxxxxxxxxx> > > >>>> Signed-off-by: KP Singh <kpsingh@xxxxxxxxxx> > > >>>> --- > > >>>> > > >>>> This fixes the regression from the LSM blob based implementation, we can > > >>>> still have UAFs, if bpf_task_storage_get is invoked after bpf_task_storage_free > > >>>> in the cleanup path. > > >>> > > >>> Can we fix this by calling bpf_task_storage_free() from free_task()? > > >> > > >> I think we can yeah. But, this is yet another deviation from how the > > >> security blob is managed (security_task_free) frees the blob that we > > >> were previously using. > > > > Does it mean doing bpf_task_storage_free() in free_task() will break some use > > cases? Could you explain? > > Doing bpf_task_storage_free() in free_task() seems to be more straight forward. > > Superficially, I don't see any issues . All I am saying is that, > before we generalized task local storage, it was allocated and freed > as a security blob and now it's deviating further. Should we just > consider moving security_task_free into task_free then? For now, I am going to move bpf_task_storage_free into free_task as it's the immediate fix and respin. > > > > > > > > > Yeah, this will make the code even more tricky. > > > > > > Another idea I had is to filter on task->__state in the helper. IOW, > > bailing out on __state == TASK_DEAD should be reasonable. > > > > task local storage does not work on starting or died tasks. But I am > > > not sure whether this will make BPF_LSM less effective (not covering > > > certain tasks). > > As long as the task local storage is usable in LSM hooks like > security_task_alloc it's okay > > > > > > > Thanks, > > > Song > > > > > > > >