On Sun, Nov 28, 2021 at 10:49:54PM +0900, Masami Hiramatsu wrote: > On Wed, 24 Nov 2021 09:41:12 +0100 > Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > > Adding support to create multiple probes within single perf event. > > This way we can associate single bpf program with multiple kprobes, > > because bpf program gets associated with the perf event. > > > > The perf_event_attr is not extended, current fields for kprobe > > attachment are used for multi attachment. > > > > For current kprobe atachment we use either: > > > > kprobe_func (in config1) + probe_offset (in config2) > > > > to define kprobe by function name with offset, or: > > > > kprobe_addr (in config2) > > > > to define kprobe with direct address value. > > > > For multi probe attach the same fields point to array of values > > with the same semantic. Each probe is defined as set of values > > with the same array index (idx) as: > > > > kprobe_func[idx] + probe_offset[idx] > > > > to define kprobe by function name with offset, or: > > > > kprobe_addr[idx] > > > > to define kprobe with direct address value. > > > > The number of probes is passed in probe_cnt value, which shares > > the union with wakeup_events/wakeup_watermark values which are > > not used for kprobes. > > > > Since [1] it's possible to stack multiple probes events under > > one head event. Using the same code to allow that for probes > > defined under perf kprobe interface. > > OK, so you also want to add multi-probes on single event by > single perf-event syscall. Not defining different events. correct.. bpf program is then attached to perf event with ioctl call.. this way we can have multiple probes attached to single bpf program > > Those are bit different, multi-probes on single event can > invoke single event handler from different probe points. For > exapmple same user bpf handler will be invoked from different > address. right, that's the goal, having single bpf program executed from multiple probes > > > > > [1] https://lore.kernel.org/lkml/156095682948.28024.14190188071338900568.stgit@devnote2/ > > Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx> > > --- > > include/uapi/linux/perf_event.h | 1 + > > kernel/trace/trace_event_perf.c | 106 ++++++++++++++++++++++++++++---- > > kernel/trace/trace_kprobe.c | 47 ++++++++++++-- > > kernel/trace/trace_probe.c | 2 +- > > kernel/trace/trace_probe.h | 3 +- > > 5 files changed, 138 insertions(+), 21 deletions(-) > > > > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h > > index bd8860eeb291..eea80709d1ed 100644 > > --- a/include/uapi/linux/perf_event.h > > +++ b/include/uapi/linux/perf_event.h > > @@ -414,6 +414,7 @@ struct perf_event_attr { > > union { > > __u32 wakeup_events; /* wakeup every n events */ > > __u32 wakeup_watermark; /* bytes before wakeup */ > > + __u32 probe_cnt; /* number of [k,u] probes */ > > }; > > > > __u32 bp_type; > > diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c > > index a114549720d6..26078e40c299 100644 > > --- a/kernel/trace/trace_event_perf.c > > +++ b/kernel/trace/trace_event_perf.c > > @@ -245,23 +245,27 @@ void perf_trace_destroy(struct perf_event *p_event) > > } > > > > #ifdef CONFIG_KPROBE_EVENTS > > -int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe) > > +static struct trace_event_call* > > +kprobe_init(bool is_retprobe, u64 kprobe_func, u64 kprobe_addr, > > + u64 probe_offset, struct trace_event_call *old) > > { > > int ret; > > char *func = NULL; > > struct trace_event_call *tp_event; > > > > - if (p_event->attr.kprobe_func) { > > + if (kprobe_func) { > > func = kzalloc(KSYM_NAME_LEN, GFP_KERNEL); > > if (!func) > > - return -ENOMEM; > > + return ERR_PTR(-ENOMEM); > > ret = strncpy_from_user( > > - func, u64_to_user_ptr(p_event->attr.kprobe_func), > > + func, u64_to_user_ptr(kprobe_func), > > KSYM_NAME_LEN); > > if (ret == KSYM_NAME_LEN) > > ret = -E2BIG; > > - if (ret < 0) > > - goto out; > > + if (ret < 0) { > > + kfree(func); > > + return ERR_PTR(ret); > > + } > > > > if (func[0] == '\0') { > > kfree(func); > > @@ -270,20 +274,96 @@ int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe) > > } > > > > tp_event = create_local_trace_kprobe( > > - func, (void *)(unsigned long)(p_event->attr.kprobe_addr), > > - p_event->attr.probe_offset, is_retprobe); > > - if (IS_ERR(tp_event)) { > > - ret = PTR_ERR(tp_event); > > - goto out; > > + func, (void *)(unsigned long)(kprobe_addr), > > + probe_offset, is_retprobe, old); > > Hmm, here I have a concern (maybe no real issue is caused at this momemnt.) > Since ftrace's multi-probe event has same event/group name among the > probes's internal event-calls. However, create_local_trace_kprobe() > actually uses the "func" name for the event name. > I think you should choose a randome different "representative" event name > for the event (not probe), and share it among the probes on the event, > if the perf event has no event name. > > (I'm not sure how the event names are used from inside of the BPF programs, > but just for the consistency.) ok, I don't think event names are used, I'll check > > > + kfree(func); > > + return tp_event; > > +} > > + > > +static struct trace_event_call* > > +kprobe_init_multi(struct perf_event *p_event, bool is_retprobe) > > +{ > > + void __user *kprobe_func = u64_to_user_ptr(p_event->attr.kprobe_func); > > + void __user *kprobe_addr = u64_to_user_ptr(p_event->attr.kprobe_addr); > > + u64 *funcs = NULL, *addrs = NULL, *offs = NULL; > > + struct trace_event_call *tp_event, *tp_old = NULL; > > + u32 i, cnt = p_event->attr.probe_cnt; > > + int ret = -EINVAL; > > + size_t size; > > + > > + if (!cnt) > > + return ERR_PTR(-EINVAL); > > + > > + size = cnt * sizeof(u64); > > + if (kprobe_func) { > > + ret = -ENOMEM; > > + funcs = kmalloc(size, GFP_KERNEL); > > + if (!funcs) > > + goto out; > > + ret = -EFAULT; > > + if (copy_from_user(funcs, kprobe_func, size)) > > + goto out; > > + } > > + > > + if (kprobe_addr) { > > + ret = -ENOMEM; > > + addrs = kmalloc(size, GFP_KERNEL); > > + if (!addrs) > > + goto out; > > + ret = -EFAULT; > > + if (copy_from_user(addrs, kprobe_addr, size)) > > + goto out; > > + /* addresses and ofsets share the same array */ > > + offs = addrs; > > } > > > > + for (i = 0; i < cnt; i++) { > > + tp_event = kprobe_init(is_retprobe, funcs ? funcs[i] : 0, > > + addrs ? addrs[i] : 0, offs ? offs[i] : 0, > > + tp_old); > > + if (IS_ERR(tp_event)) { > > + if (tp_old) > > + destroy_local_trace_kprobe(tp_old); > > + ret = PTR_ERR(tp_event); > > + goto out; > > + } > > + if (!tp_old) > > + tp_old = tp_event; > > + } > > + ret = 0; > > +out: > > + kfree(funcs); > > + kfree(addrs); > > + return ret ? ERR_PTR(ret) : tp_old; > > +} > > + > > +static struct trace_event_call* > > +kprobe_init_single(struct perf_event *p_event, bool is_retprobe) > > +{ > > + struct perf_event_attr *attr = &p_event->attr; > > + > > + return kprobe_init(is_retprobe, attr->kprobe_func, attr->kprobe_addr, > > + attr->probe_offset, NULL); > > +} > > + > > +int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe) > > +{ > > + struct trace_event_call *tp_event; > > + int ret; > > + > > + if (p_event->attr.probe_cnt) > > isn't this "p_event->attr.probe_cnt > 1" ? right, probe_cnt is just added by this patchset and used only when there are multiple probes being attached, so that's why it works even with 1 at the moment will change > > > + tp_event = kprobe_init_multi(p_event, is_retprobe); > > + else > > + tp_event = kprobe_init_single(p_event, is_retprobe); > > + > > + if (IS_ERR(tp_event)) > > + return PTR_ERR(tp_event); > > + > > mutex_lock(&event_mutex); > > ret = perf_trace_event_init(tp_event, p_event); > > if (ret) > > destroy_local_trace_kprobe(tp_event); > > mutex_unlock(&event_mutex); > > -out: > > - kfree(func); > > return ret; > > } > > > > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c > > index 33272a7b6912..86a7aada853a 100644 > > --- a/kernel/trace/trace_kprobe.c > > +++ b/kernel/trace/trace_kprobe.c > > @@ -237,13 +237,18 @@ static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs); > > static int kretprobe_dispatcher(struct kretprobe_instance *ri, > > struct pt_regs *regs); > > > > +static void __free_trace_kprobe(struct trace_kprobe *tk) > > +{ > > + kfree(tk->symbol); > > + free_percpu(tk->nhit); > > + kfree(tk); > > +} > > + > > static void free_trace_kprobe(struct trace_kprobe *tk) > > { > > if (tk) { > > trace_probe_cleanup(&tk->tp); > > - kfree(tk->symbol); > > - free_percpu(tk->nhit); > > - kfree(tk); > > + __free_trace_kprobe(tk); > > Why is this needed? I needed some free function that does not call trace_probe_cleanup and trace_probe_unlink.. I wrote details in the destroy_local_trace_kprobe comment below > > > } > > } > > > > @@ -1796,7 +1801,7 @@ static int unregister_kprobe_event(struct trace_kprobe *tk) > > /* create a trace_kprobe, but don't add it to global lists */ > > struct trace_event_call * > > create_local_trace_kprobe(char *func, void *addr, unsigned long offs, > > - bool is_return) > > + bool is_return, struct trace_event_call *old) > > { > > enum probe_print_type ptype; > > struct trace_kprobe *tk; > > @@ -1820,6 +1825,28 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs, > > return ERR_CAST(tk); > > } > > > > + if (old) { > > + struct trace_kprobe *tk_old; > > + > > + tk_old = trace_kprobe_primary_from_call(old); > > So, this will choose the first(primary) probe's function name as > the representative event name. But other probes can have different > event names. ok > > > + if (!tk_old) { > > + ret = -EINVAL; > > + goto error; > > + } > > + > > + /* Append to existing event */ > > + ret = trace_probe_append(&tk->tp, &tk_old->tp); > > + if (ret) > > + goto error; > > + > > + /* Register k*probe */ > > + ret = __register_trace_kprobe(tk); > > + if (ret) > > + goto error; > > If "appended" probe failed to register, it must be "unlinked" from > the first one and goto error to free the trace_kprobe. > > if (ret) { > trace_probe_unlink(&tk->tp); > goto error; > } > > See append_trace_kprobe() for details. so there's goto error jumping to: error: free_trace_kprobe(tk); that calls: trace_probe_cleanup -> trace_probe_unlink that should do it, right? > > > + > > + return trace_probe_event_call(&tk->tp); > > + } > > + > > init_trace_event_call(tk); > > > > ptype = trace_kprobe_is_return(tk) ? > > @@ -1841,6 +1868,8 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs, > > > > void destroy_local_trace_kprobe(struct trace_event_call *event_call) > > { > > + struct trace_probe_event *event; > > + struct trace_probe *pos, *tmp; > > struct trace_kprobe *tk; > > > > tk = trace_kprobe_primary_from_call(event_call); > > @@ -1852,9 +1881,15 @@ void destroy_local_trace_kprobe(struct trace_event_call *event_call) > > return; > > } > > > > - __unregister_trace_kprobe(tk); > > + event = tk->tp.event; > > + list_for_each_entry_safe(pos, tmp, &event->probes, list) { > > + list_for_each_entry_safe(pos, tmp, &event->probes, list) { > > + list_del_init(&pos->list); > > + __unregister_trace_kprobe(tk); > > + __free_trace_kprobe(tk); > > + } > > > > - free_trace_kprobe(tk); > > + trace_probe_event_free(event); > > Actually, each probe already allocated the trace_probe events (which are not > used if it is appended). Thus you have to use trace_probe_unlink(&tk->tp) in > the above loop. > > list_for_each_entry_safe(pos, tmp, &event->probes, list) { > list_for_each_entry_safe(pos, tmp, &event->probes, list) { > __unregister_trace_kprobe(tk); > trace_probe_unlink(&tk->tp); /* This will call trace_probe_event_free() internally */ > free_trace_kprobe(tk); > } so calling trace_probe_event_free inside this loop is a problem, because the loop iterates that trace_probe_event's probes list, and last probe removed will trigger trace_probe_event_free, that will free the list we iterate.. and we go down ;-) so that's why I added new free function '__free_trace_kprobe' that frees everything as free_trace_kprobe, but does not call trace_probe_unlink event = tk->tp.event; list_for_each_entry_safe(pos, tmp, &event->probes, list) { list_for_each_entry_safe(pos, tmp, &event->probes, list) { list_del_init(&pos->list); __unregister_trace_kprobe(tk); __free_trace_kprobe(tk); } trace_probe_event_free(event); and there's trace_probe_event_free(event) to make the final free thanks, jirka