Re: [PATCH 2/8] uprobes: revamp uprobe refcounting and lifetime management

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 2 Aug 2024 08:03:47 -0700

On Fri, Aug 2, 2024 at 4:11 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> On Wed, Jul 31, 2024 at 02:42:50PM -0700, Andrii Nakryiko wrote:
>
> SNIP
>
> > -/*
> > - * There could be threads that have already hit the breakpoint. They
> > - * will recheck the current insn and restart if find_uprobe() fails.
> > - * See find_active_uprobe().
> > - */
> > -static void delete_uprobe(struct uprobe *uprobe)
> > -{
> > -     if (WARN_ON(!uprobe_is_active(uprobe)))
> > -             return;
> > -
> > -     write_lock(&uprobes_treelock);
> > -     rb_erase(&uprobe->rb_node, &uprobes_tree);
> > -     write_unlock(&uprobes_treelock);
> > -     RB_CLEAR_NODE(&uprobe->rb_node); /* for uprobe_is_active() */
> > -}
> > -
> >  struct map_info {
> >       struct map_info *next;
> >       struct mm_struct *mm;
> > @@ -1094,17 +1120,12 @@ void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
> >       int err;
> >
> >       down_write(&uprobe->register_rwsem);
> > -     if (WARN_ON(!consumer_del(uprobe, uc)))
> > +     if (WARN_ON(!consumer_del(uprobe, uc))) {
> >               err = -ENOENT;
> > -     else
> > +     } else {
> >               err = register_for_each_vma(uprobe, NULL);
> > -
> > -     /* TODO : cant unregister? schedule a worker thread */
> > -     if (!err) {
> > -             if (!uprobe->consumers)
> > -                     delete_uprobe(uprobe);
>
> ok, so removing this call is why the consumer test is failing, right?
>
> IIUC the previous behaviour was to remove uprobe from the tree
> even when there's active uprobe ref for installed uretprobe
>
> so following scenario will now behaves differently:
>
>   install uretprobe/consumer-1 on foo
>   foo {
>     remove uretprobe/consumer-1                (A)
>     install uretprobe/consumer-2 on foo        (B)
>   }
>
> before the removal of consumer-1 (A) would remove the uprobe object
> from the tree, so the installation of consumer-2 (b) would create
> new uprobe object which would not be triggered at foo return because
> it got installed too late (after foo uprobe was triggered)
>
> the behaviour with this patch is that removal of consumer-1 (A) will
> not remove the uprobe object (that happens only when we run out of
> refs), and the following install of consumer-2 will use the existing
> uprobe object so the consumer-2 will be triggered on foo return
>
> uff ;-)

yep, something like that

>
> but I think it's better, because we get more hits

note, with the next patch set that makes uretprobes SRCU protected
(but with timeout) the behavior becomes a bit time-sensitive :) so I
think we'll have to change your selftest to first attach all the new
uretprobes, then detach all the uretprobes that had to be detached,
and then do a bit more relaxed logic of the sort "if there were some
uretprobes before and after, then we *might* get uretprobe triggered
(but we might not as well, unless the same uretprobe stayed attached
at all times)".

Anyways, something to take care of in the bpf-next tree separately.

All this is very much an implementation detail, so I think we can
change these aspects freely.

>
> jirka
>
> > -             else
> > -                     err = -EBUSY;
> > +             /* TODO : cant unregister? schedule a worker thread */
> > +             WARN(err, "leaking uprobe due to failed unregistration");
> >       }
> >       up_write(&uprobe->register_rwsem);
> >
> > @@ -1159,27 +1180,16 @@ struct uprobe *uprobe_register(struct inode *inode,
> >       if (!IS_ALIGNED(ref_ctr_offset, sizeof(short)))
> >               return ERR_PTR(-EINVAL);
> >
> > - retry:
> >       uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
> >       if (IS_ERR(uprobe))
> >               return uprobe;
> >
> > -     /*
> > -      * We can race with uprobe_unregister()->delete_uprobe().
> > -      * Check uprobe_is_active() and retry if it is false.
> > -      */
> >       down_write(&uprobe->register_rwsem);
> > -     ret = -EAGAIN;
> > -     if (likely(uprobe_is_active(uprobe))) {
> > -             consumer_add(uprobe, uc);
> > -             ret = register_for_each_vma(uprobe, uc);
> > -     }
> > +     consumer_add(uprobe, uc);
> > +     ret = register_for_each_vma(uprobe, uc);
> >       up_write(&uprobe->register_rwsem);
> > -     put_uprobe(uprobe);
> >
> >       if (ret) {
> > -             if (unlikely(ret == -EAGAIN))
> > -                     goto retry;
> >               uprobe_unregister(uprobe, uc);
> >               return ERR_PTR(ret);
>
> SNIP