On Fri, Aug 28, 2020 at 03:00:59AM +0900, Masami Hiramatsu wrote: > On Thu, 27 Aug 2020 18:12:40 +0200 > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > +static void invalidate_rp_inst(struct task_struct *t, struct kretprobe *rp) > > +{ > > + struct invl_rp_ipi iri = { > > + .task = t, > > + .rp = rp, > > + .done = false > > + }; > > + > > + for (;;) { > > + if (try_invoke_on_locked_down_task(t, __invalidate_rp_inst, rp)) > > + return; > > + > > + smp_call_function_single(task_cpu(t), __invalidate_rp_ipi, &iri, 1); > > + if (iri.done) > > + return; > > + } > > Hmm, what about making a status place holder and point it from > each instance to tell it is valid or not? > > struct kretprobe_holder { > atomic_t refcnt; > struct kretprobe *rp; > }; > > struct kretprobe { > ... > struct kretprobe_holder *rph; // allocate at register > ... > }; > > struct kretprobe_instance { > ... > struct kretprobe_holder *rph; // free if refcnt == 0 > ... > }; > > cleanup_rp_inst(struct kretprobe *rp) > { > rp->rph->rp = NULL; > } > > kretprobe_trampoline_handler() > { > ... > rp = READ_ONCE(ri->rph-rp); > if (likely(rp)) { > // call rp->handler > } else > rcu_call(ri, free_rp_inst_rcu); > ... > } > > free_rp_inst_rcu() > { > if (!atomic_dec_return(ri->rph->refcnt)) > kfree(ri->rph); > kfree(ri); > } > > This increase kretprobe_instance a bit, but make things simpler. > (and still keep lockless, atomic op is in the rcu callback). Yes, much better. Although I'd _love_ to get rid of rp->data_size, then we can simplify all of this even more. I was thinking we could then have a single global freelist thing and add some per-cpu cache to it (say 4-8 entries) to avoid the worst contention. And then make function-graph use this, instead of the other way around :-)