On Thu, Jan 19, 2023 at 11:21:33AM -0800, Alexei Starovoitov wrote: > On Thu, Jan 19, 2023 at 8:52 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > > > > On Tue, Jan 17, 2023 at 09:14:42PM -0800, Alexei Starovoitov wrote: > > > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > > > There are several issues with copy_from_user_nofault(): > > > > > > - access_ok() is designed for user context only and for that reason > > > it has WARN_ON_IN_IRQ() which triggers when bpf, kprobe, eprobe > > > and perf on ppc are calling it from irq. > > > > > > - it's missing nmi_uaccess_okay() which is a nop on all architectures > > > except x86 where it's required. > > > The comment in arch/x86/mm/tlb.c explains the details why it's necessary. > > > Calling copy_from_user_nofault() from bpf, [ke]probe without this check is not safe. > > > > > > - __copy_from_user_inatomic() under CONFIG_HARDENED_USERCOPY is calling > > > check_object_size()->__check_object_size()->check_heap_object()->find_vmap_area()->spin_lock() > > > which is not safe to do from bpf, [ke]probe and perf due to potential deadlock. > > > > Er, this drops check_object_size() -- that needs to stay. The vmap area > > test in check_object_size is likely what needs fixing. It was discussed > > before: > > https://lore.kernel.org/lkml/YySML2HfqaE%2FwXBU@xxxxxxxxxxxxxxxxxxxx/ > > Thanks for the link. > Unfortunately all options discussed in that link won't work, > since all of them rely on in_interrupt() which will not catch the condition. > [ke]probe, bpf, perf can run after spin_lock is taken. > Like via trace_lock_release tracepoint. > It's only with lockdep=on, but still. > Or via trace_contention_begin tracepoint with lockdep=off. > check_object_size() will not execute in_interrupt(). > > > The only reason it was ultimately tolerable to remove the check from > > the x86-only _nmi function was because it was being used on compile-time > > sized copies. > > It doesn't look to be the case. > copy_from_user_nmi() is called via __output_copy_user by perf > with run-time 'size'. Perhaps this changed recently? It was only called in copy_code() before when I looked last. Regardless, it still needs solving. > > We need to fix the vmap lookup so the checking doesn't regress -- > > especially for trace, bpf, etc, where we could have much more interested > > dest/source/size combinations. :) > > Well, for bpf the 'dst' is never a vmalloc area, so > is_vmalloc_addr() and later spin_lock() in check_heap_object() > won't trigger. > Also for bpf the 'dst' area is statically checked by the verifier > at program load time, so at run-time the dst pointer is > guaranteed to be valid and of correct dimensions. > So doing check_object_size() is pointless unless there is a bug > in the verifier, but if there is a bug kasan and friends > will find it sooner. The 'dst' checks are generic and > not copy_from_user_nofault() specific. > > For trace, kprobe and perf would be nice to keep check_object_size() > working, of course. > > What do you suggest? > I frankly don't see other options other than done in this patch, > though it's not great. > Happy to be proven otherwise. Matthew, do you have any thoughts on dealing with this? Can we use a counter instead of a spin lock? -Kees -- Kees Cook