Re: [PATCH mm/bpf v2] mm: bpf: add find_vma_no_check() without lockdep_assert on mm->mmap_lock

Liam Howlett <liam.howlett@xxxxxxxxxx> · Wed, 8 Sep 2021 18:30:52 +0000

* Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> [210908 14:21]:
> On Wed, Sep 8, 2021 at 11:15 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, 8 Sep 2021 11:02:58 -0700 Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > > > Please describe the expected userspace-visible change from Peter's
> > > > patch in full detail?
> > >
> > > User space expects build_id to be available. Peter patch simply removes
> > > that feature.
> >
> > Are you sure?  He ends up with
> 
> More than sure :)
> Just look at below.
> 
> > static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
> >                                           u64 *ips, u32 trace_nr, bool user)
> > {
> >         int i;
> >
> >         /* cannot access current->mm, fall back to ips */
> >         for (i = 0; i < trace_nr; i++) {
> >                 id_offs[i].status = BPF_STACK_BUILD_ID_IP;
> >                 id_offs[i].ip = ips[i];
> >                 memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX);
> >         }
> >         return;
> > }
> >
> > and you're saying that userspace won't like this because we didn't set
> > BPF_STACK_BUILD_ID_VALID?
> 
> The patch forces the "fallback path" that in production is seen 0.001%
> Meaning that user space doesn't see build_id any more. It sees IPs only.
> The user space cannot correlate IPs to binaries. That's what build_id enabled.

I was thinking of decomposing the checks in my first response to two
functions.

Something like this:
--------------

diff --git a/mm/mmap.c b/mm/mmap.c
index dce46105e3df..8afc1d22aa61 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2293,12 +2293,13 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
 EXPORT_SYMBOL(get_unmapped_area);
 
 /* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
-struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
+struct vm_area_struct *find_vma_non_owner(struct mm_struct *mm,
+					 unsigned long addr)
 {
 	struct rb_node *rb_node;
 	struct vm_area_struct *vma;
 
-	mmap_assert_locked(mm);
+	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
 	/* Check the cache first. */
 	vma = vmacache_find(mm, addr);
 	if (likely(vma))
@@ -2325,6 +2326,11 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
 	return vma;
 }
 
+struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
+{
+	lockdep_assert_held(&mm->mmap_lock);
+	return find_vma_non_owner(mm, addr);
+}
 EXPORT_SYMBOL(find_vma);
 
 /*

--------------

Although this leaks more into the mm API and was referred to as ugly
previously, it does provide a working solution and still maintains the
same level of checking.

Would it push the back actors to just switch to non_owner though?


Thanks,
Liam