Re: [PATCH bpf-next] bpf: avoid holding freeze_mutex during mmap operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 24, 2025 at 11:56 AM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
>
> We use map->freeze_mutex to prevent races between map_freeze() and
> memory mapping BPF map contents with writable permissions. The way we
> naively do this means we'll hold freeze_mutex for entire duration of all
> the mm and VMA manipulations, which is completely unnecessary. This can
> potentially also lead to deadlocks, as reported by syzbot in [0].
>
> So, instead, hold freeze_mutex only during writeability checks, bump
> (proactively) "write active" count for the map, unlock the mutex and
> proceed with mmap logic. And only if something went wrong during mmap
> logic, then undo that "write active" counter increment.
>
> Note, instead of checking VM_MAYWRITE we check VM_WRITE before and after
> mmaping, because we also have a logic that unsets VM_MAYWRITE
> forcefully, if VM_WRITE is not set. So VM_MAYWRITE could be set early on
> for read-only mmaping, but it won't be afterwards. VM_WRITE is
> a consistent way to detect writable mmaping in our implementation.

bpf_map_mmap_open/bpf_map_mmap_close use VM_MAYWRITE,

Do they need to change as well?

>   [0] https://lore.kernel.org/bpf/678dcbc9.050a0220.303755.0066.GAE@xxxxxxxxxx/
>
> Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
> Reported-by: syzbot+4dc041c686b7c816a71e@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> ---
>  kernel/bpf/syscall.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 0daf098e3207..0d5b39e99770 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1035,7 +1035,7 @@ static const struct vm_operations_struct bpf_map_default_vmops = {
>  static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
>  {
>         struct bpf_map *map = filp->private_data;
> -       int err;
> +       int err = 0;
>
>         if (!map->ops->map_mmap || !IS_ERR_OR_NULL(map->record))
>                 return -ENOTSUPP;
> @@ -1059,7 +1059,12 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
>                         err = -EACCES;
>                         goto out;
>                 }
> +               bpf_map_write_active_inc(map);
>         }
> +out:
> +       mutex_unlock(&map->freeze_mutex);
> +       if (err)
> +               return err;
>
>         /* set default open/close callbacks */
>         vma->vm_ops = &bpf_map_default_vmops;
> @@ -1070,13 +1075,14 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
>                 vm_flags_clear(vma, VM_MAYWRITE);
>
>         err = map->ops->map_mmap(map, vma);
> -       if (err)
> -               goto out;
> +       if (err) {
> +               if (vma->vm_flags & VM_WRITE) {
> +                       mutex_lock(&map->freeze_mutex);
> +                       bpf_map_write_active_dec(map);
> +                       mutex_unlock(&map->freeze_mutex);

Extra lock/unlock looks unnecessary.

This functiona and map_freeze() need to see frozen and write_active coherent,
but write_active_dec looks like without mutex.
It's atomic64_dec.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux