On Fri, Apr 10, 2020 at 1:51 AM Jann Horn <jannh@xxxxxxxxxx> wrote: > > On Fri, Apr 10, 2020 at 2:04 AM Andrii Nakryiko <andriin@xxxxxx> wrote: > > VM_MAYWRITE flag during initial memory mapping determines if already mmap()'ed > > pages can be later remapped as writable ones through mprotect() call. To > > prevent user application to rewrite contents of memory-mapped as read-only and > > subsequently frozen BPF map, remove VM_MAYWRITE flag completely on initially > > read-only mapping. > > > > Alternatively, we could treat any memory-mapping on unfrozen map as writable > > and bump writecnt instead. But there is little legitimate reason to map > > BPF map as read-only and then re-mmap() it as writable through mprotect(), > > instead of just mmap()'ing it as read/write from the very beginning. > > > > Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY") > > Reported-by: Jann Horn <jannh@xxxxxxxxxx> > > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx> > > --- > > kernel/bpf/syscall.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index 64783da34202..f7f6db50a085 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > > @@ -635,6 +635,10 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) > > /* set default open/close callbacks */ > > vma->vm_ops = &bpf_map_default_vmops; > > vma->vm_private_data = map; > > + vma->vm_flags &= ~VM_MAYEXEC; > > + if (!(vma->vm_flags & VM_WRITE)) > > + /* disallow re-mapping with PROT_WRITE */ > > + vma->vm_flags &= ~VM_MAYWRITE; > > The .open and .close handlers for the VMA are also wrong: Yes, it has to check VM_MAYWRITE now, my bad, thanks for catching this! Extended selftest to validate that scenario as well. > > /* called for any extra memory-mapped regions (except initial) */ > static void bpf_map_mmap_open(struct vm_area_struct *vma) > { > struct bpf_map *map = vma->vm_file->private_data; > > bpf_map_inc_with_uref(map); > > if (vma->vm_flags & VM_WRITE) { > mutex_lock(&map->freeze_mutex); > map->writecnt++; > mutex_unlock(&map->freeze_mutex); > } > } > > /* called for all unmapped memory region (including initial) */ > static void bpf_map_mmap_close(struct vm_area_struct *vma) > { > struct bpf_map *map = vma->vm_file->private_data; > > if (vma->vm_flags & VM_WRITE) { > mutex_lock(&map->freeze_mutex); > map->writecnt--; > mutex_unlock(&map->freeze_mutex); > } > > bpf_map_put_with_uref(map); > } > > static const struct vm_operations_struct bpf_map_default_vmops = { > .open = bpf_map_mmap_open, > .close = bpf_map_mmap_close, > }; > > You can use mprotect() to flip VM_WRITE off while a VMA exists, and > then the writecnt won't go down when you close it. Or you could even > get the writecnt to go negative if you map as writable, then > mprotect() to readonly, then split the VMA a few times, mprotect the > split VMAs to writable, and then unmap them. > > I think you'll want to also check for MAYWRITE here. > > Also, the bpf_map_inc_with_uref/bpf_map_put_with_uref here look > superfluous - the VMA holds a reference to the file, and the file > holds a reference to the map. Hm.. So the file from which memory-mapping was created originally will stay referenced by VMA subsystem until the last VMA segment is unmmaped (and bpf_map_mmap_close is called for it), even if file itself is closed from user-space? It makes sense, though I didn't realize it at the time I was implementing this. I'll drop refcounting then.