Re: [PATCH v4 bpf-next 1/5] bpf: Remove redundant synchronize_rcu.

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 29 Jun 2020 19:56:13 -0700

On Mon, Jun 29, 2020 at 06:08:48PM -0700, Andrii Nakryiko wrote:
> On Mon, Jun 29, 2020 at 5:58 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Mon, Jun 29, 2020 at 5:35 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > From: Alexei Starovoitov <ast@xxxxxxxxxx>
> > >
> > > bpf_free_used_maps() or close(map_fd) will trigger map_free callback.
> > > bpf_free_used_maps() is called after bpf prog is no longer executing:
> > > bpf_prog_put->call_rcu->bpf_prog_free->bpf_free_used_maps.
> > > Hence there is no need to call synchronize_rcu() to protect map elements.
> > >
> > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> > > ---
> >
> > Seems correct. And nice that maps don't have to care about this anymore.
> >
> 
> Actually, what about the map-in-map case?
> 
> What if you had an array-of-maps with an inner map element. It is the
> last reference to that map. Now you have two BPF prog executions in
> parallel. One looked up that inner map and is updating it at the
> moment. Another execution at the same time deletes that map. That
> deletion will call bpf_map_put(), which without synchronize_rcu() will
> free memory. All the while the former BPF program execution is still
> working with that map.

The delete of that inner map can only be done via sys_bpf() and there
we do maybe_wait_bpf_programs() exactly to avoid this kind of problems.
It's also necessary for user space. When the user is doing map_update/delete
of inner map as soon as syscall returns the user can process
old map with guarantees that no bpf prog is touching inner map.