----- On Feb 3, 2021, at 11:05 AM, rostedt rostedt@xxxxxxxxxxx wrote: > From: "Steven Rostedt (VMware)" <rostedt@xxxxxxxxxxx> > > The list of tracepoint callbacks is managed by an array that is protected > by RCU. To update this array, a new array is allocated, the updates are > copied over to the new array, and then the list of functions for the > tracepoint is switched over to the new array. After a completion of an RCU > grace period, the old array is freed. > > This process happens for both adding a callback as well as removing one. > But on removing a callback, if the new array fails to be allocated, the > callback is not removed, and may be used after it is freed by the clients > of the tracepoint. > > There's really no reason to fail if the allocation for a new array fails > when removing a function. Instead, the function can simply be replaced by a > stub function that could be cleaned up on the next modification of the > array. That is, instead of calling the function registered to the > tracepoint, it would call a stub function in its place. > > Link: https://lore.kernel.org/r/20201115055256.65625-1-mmullins@xxxxxxx > Link: https://lore.kernel.org/r/20201116175107.02db396d@xxxxxxxxxxxxxxxxxx > Link: https://lore.kernel.org/r/20201117211836.54acaef2@xxxxxxxxxxxxxxxx > Link: https://lkml.kernel.org/r/20201118093405.7a6d2290@xxxxxxxxxxxxxxxxxx > > [ Note, this version does use undefined compiler behavior (assuming that > a stub function with no parameters or return, can be called by a location > that thinks it has parameters but still no return value. Static calls > do the same thing, so this trick is not without precedent. > > There's another solution that uses RCU tricks and is more complex, but > can be an alternative if this solution becomes an issue. > > Link: https://lore.kernel.org/lkml/20210127170721.58bce7cc@xxxxxxxxxxxxxxxxxx/ > ] > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx> > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Alexei Starovoitov <ast@xxxxxxxxxx> > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx> > Cc: Martin KaFai Lau <kafai@xxxxxx> > Cc: Song Liu <songliubraving@xxxxxx> > Cc: Yonghong Song <yhs@xxxxxx> > Cc: Andrii Nakryiko <andriin@xxxxxx> > Cc: John Fastabend <john.fastabend@xxxxxxxxx> > Cc: KP Singh <kpsingh@xxxxxxxxxxxx> > Cc: netdev <netdev@xxxxxxxxxxxxxxx> > Cc: bpf <bpf@xxxxxxxxxxxxxxx> > Cc: Kees Cook <keescook@xxxxxxxxxxxx> > Cc: Florian Weimer <fw@xxxxxxxxxxxxx> > Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints") > Reported-by: syzbot+83aa762ef23b6f0d1991@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: syzbot+d29e58bb557324e55e5e@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: Matt Mullins <mmullins@xxxxxxx> > Signed-off-by: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx> > Tested-by: Matt Mullins <mmullins@xxxxxxx> > --- > kernel/tracepoint.c | 80 ++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 64 insertions(+), 16 deletions(-) > > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c > index 7261fa0f5e3c..e8f20ae29c18 100644 > --- a/kernel/tracepoint.c > +++ b/kernel/tracepoint.c > @@ -53,6 +53,12 @@ struct tp_probes { > struct tracepoint_func probes[]; > }; > > +/* Called in removal of a func but failed to allocate a new tp_funcs */ > +static void tp_stub_func(void) > +{ > + return; > +} > + > static inline void *allocate_probes(int count) > { > struct tp_probes *p = kmalloc(struct_size(p, probes, count), > @@ -131,6 +137,7 @@ func_add(struct tracepoint_func **funcs, struct > tracepoint_func *tp_func, > { > struct tracepoint_func *old, *new; > int nr_probes = 0; > + int stub_funcs = 0; > int pos = -1; > > if (WARN_ON(!tp_func->func)) > @@ -147,14 +154,34 @@ func_add(struct tracepoint_func **funcs, struct > tracepoint_func *tp_func, > if (old[nr_probes].func == tp_func->func && > old[nr_probes].data == tp_func->data) > return ERR_PTR(-EEXIST); > + if (old[nr_probes].func == tp_stub_func) > + stub_funcs++; > } > } > - /* + 2 : one for new probe, one for NULL func */ > - new = allocate_probes(nr_probes + 2); > + /* + 2 : one for new probe, one for NULL func - stub functions */ > + new = allocate_probes(nr_probes + 2 - stub_funcs); > if (new == NULL) > return ERR_PTR(-ENOMEM); > if (old) { > - if (pos < 0) { > + if (stub_funcs) { Considering that we end up implementing a case where we carefully copy over each item, I recommend we replace the two "memcpy" branches by a single item-wise implementation. It's a slow-path anyway, and reducing the overall complexity is a benefit for slow paths. Fewer bugs, less code to review, and it's easier to reach a decent testing state-space coverage. > + /* Need to copy one at a time to remove stubs */ > + int probes = 0; > + > + pos = -1; > + for (nr_probes = 0; old[nr_probes].func; nr_probes++) { > + if (old[nr_probes].func == tp_stub_func) > + continue; > + if (pos < 0 && old[nr_probes].prio < prio) > + pos = probes++; > + new[probes++] = old[nr_probes]; > + } > + nr_probes = probes; Repurposing "nr_probes" from accounting for the number of items in the old array to counting the number of items in the new array in the middle of the function is confusing. > + if (pos < 0) > + pos = probes; > + else > + nr_probes--; /* Account for insertion */ This is probably why you need to play tricks with nr_probes here. > + } else if (pos < 0) { > pos = nr_probes; > memcpy(new, old, nr_probes * sizeof(struct tracepoint_func)); > } else { > @@ -188,8 +215,9 @@ static void *func_remove(struct tracepoint_func **funcs, > /* (N -> M), (N > 1, M >= 0) probes */ > if (tp_func->func) { > for (nr_probes = 0; old[nr_probes].func; nr_probes++) { > - if (old[nr_probes].func == tp_func->func && > - old[nr_probes].data == tp_func->data) > + if ((old[nr_probes].func == tp_func->func && > + old[nr_probes].data == tp_func->data) || > + old[nr_probes].func == tp_stub_func) > nr_del++; > } > } > @@ -208,14 +236,32 @@ static void *func_remove(struct tracepoint_func **funcs, > /* N -> M, (N > 1, M > 0) */ > /* + 1 for NULL */ > new = allocate_probes(nr_probes - nr_del + 1); > - if (new == NULL) > - return ERR_PTR(-ENOMEM); > - for (i = 0; old[i].func; i++) > - if (old[i].func != tp_func->func > - || old[i].data != tp_func->data) > - new[j++] = old[i]; > - new[nr_probes - nr_del].func = NULL; > - *funcs = new; > + if (new) { > + for (i = 0; old[i].func; i++) > + if ((old[i].func != tp_func->func > + || old[i].data != tp_func->data) > + && old[i].func != tp_stub_func) > + new[j++] = old[i]; > + new[nr_probes - nr_del].func = NULL; > + *funcs = new; > + } else { > + /* > + * Failed to allocate, replace the old function > + * with calls to tp_stub_func. > + */ > + for (i = 0; old[i].func; i++) > + if (old[i].func == tp_func->func && > + old[i].data == tp_func->data) { > + old[i].func = tp_stub_func; This updates "func" while readers are loading it concurrently. I would recommend using WRITE_ONCE here paired with READ_ONCE within __traceiter_##_name. > + /* Set the prio to the next event. */ I don't get why the priority needs to be changed here. Could it simply stay at its original value ? It's already in the correct priority order anyway. > + if (old[i + 1].func) > + old[i].prio = > + old[i + 1].prio; > + else > + old[i].prio = -1; > + } > + *funcs = old; I'm not sure what setting *funcs to old achieves ? Isn't it already pointing to old ? I'll send a patch which applies on top of yours implementing my recommendations. It shrinks the code complexity nicely: include/linux/tracepoint.h | 2 +- kernel/tracepoint.c | 80 +++++++++++++------------------------- 2 files changed, 28 insertions(+), 54 deletions(-) Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com