On Wed, Dec 08, 2021 at 06:19:05PM -0500, Steven Rostedt wrote: > On Wed, 1 Dec 2021 10:25:04 -0800 > Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote: > > > Minimal support for interacting with dynamic events, trace_event and > > ftrace. Core outline of flow between user process, ioctl and trace_event > > APIs. > > > > Signed-off-by: Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> > > --- > > kernel/trace/Kconfig | 15 + > > kernel/trace/Makefile | 1 + > > kernel/trace/trace_events_user.c | 1192 ++++++++++++++++++++++++++++++ > > 3 files changed, 1208 insertions(+) > > create mode 100644 kernel/trace/trace_events_user.c > > > > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig > > index 420ff4bc67fd..21d00092436b 100644 > > --- a/kernel/trace/Kconfig > > +++ b/kernel/trace/Kconfig > > @@ -724,6 +724,21 @@ config SYNTH_EVENTS > > > > If in doubt, say N. > > > > +config USER_EVENTS > > + bool "User trace events" > > + select TRACING > > + select DYNAMIC_EVENTS > > + default n > > default n is default, so you do not need to explicitly state that. > > In other words, the above line is a nop. > Got it. > > + help > > + User trace events are user-defined trace events that > > + can be used like an existing kernel trace event. User trace > > + events are generated by writing to a tracefs file. User > > + processes can determine if their tracing events should be > > + generated by memory mapping a tracefs file and checking for > > + an associated byte being non-zero. > > + > > + If in doubt, say N. > > + > > config HIST_TRIGGERS > > bool "Histogram triggers" > > depends on ARCH_HAVE_NMI_SAFE_CMPXCHG > > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile > > index bedc5caceec7..19ef3758da95 100644 > > --- a/kernel/trace/Makefile > > +++ b/kernel/trace/Makefile > > > > > > +/* > > + * Handles the final close of the file from user mode. > > + */ > > +static int user_events_release(struct inode *node, struct file *file) > > +{ > > + struct user_event_refs *refs; > > + struct user_event *user; > > + int i; > > + > > + /* > > + * refs is protected by RCU and could in theory change immediately > > + * before this call on another core. To ensure we read the latest > > + * version of refs we acquire the RCU read lock again. > > + */ > > + rcu_read_lock_sched(); > > + refs = rcu_dereference_sched(file->private_data); > > + rcu_read_unlock_sched(); > > This still bothers me. Can another CPU call an ioctl here? Sorry :) No, another CPU cannot call the ioctl on the file, since if another CPU had a reference to this file release couldn't be called. user_events_release is only called when the final reference to the file has been closed, so there cannot be another ioctl pending, starting or finishing for this file at the time it is called. The last user mode program to call close() on the file will end up invoking user_events_release. The user_event_refs is only accessible via the file's private_data, which now has zero references when release is executing. This means the private_data can no longer change and the rcu deref ensures we have the latest version. refs is per-file, so while there can be other ioctl's occurring for other files, they are completely different ref objects than the one being cleaned up in the release of the file, it's not shared outside of this file lifetime, which has now ended. > > user_events_ioctl_reg() { > user_events_ref_add() { > refs = rcu_dereference_protected(file->private_data, ..); > new_refs = kzalloc(size, GFP_KERNEL); > rcu_assign_pointer(file->private_data, new_refs); > if (refs) > kfree_rcu(refs, rcu); > > refs now freed. > If user_events_ioctl is executing for that same file, user_events_release could not have been called due to the file being in use to issue the ioctl. > > + > > + if (!refs) > > + goto out; > > + > > + /* > > + * Do not need RCU while enumerating the events that were used. > > + * The lifetime of refs has reached an end, it's tied to this file. > > + * The underlying user_events are ref counted, and cannot be freed. > > + * After this decrement, the user_events may be freed elsewhere. > > + */ > > + for (i = 0; i < refs->count; ++i) { > > Fault on refs->count > > ?? refs after rcu_dereference is checked for null before accessing. refs cannot be changed when release is being executed, since that would mean the ioctl ran without a file reference (not sure how that could happen). This is why it's important that release get the latest version of refs, an ioctl could have JUST happened before the final close() in user mode, and if it jumped CPUs we could (in theory) get an old value. If we got an old value, then yes, the fault could occur. This code uses the file ops release method as a final sync point to clean up everything for that file only after there are no more references to it at all, so nobody can do this kind of thing. Is there some case I am missing where an ioctl on a file can be performed without a reference to that file? Are you worried about a malicious user calling close on the file and then immediately issuing an ioctl on the now closed file? If so, wouldn't ioctl just reject that file reference being used as not in the processes file table / invalid and not let the ioctl go through? > > > + user = refs->events[i]; > > + > > + if (user) > > + atomic_dec(&user->refcnt); > > + } > > + > > + kfree_rcu(refs, rcu); > > +out: > > + return 0; > > +} > > + [..] > > +static const struct file_operations user_status_fops = { > > + .mmap = user_status_mmap, > > + .read = user_status_read, > > + .llseek = user_status_seek, > > + .release = user_status_release, > > +}; > > + > > +/* > > + * Creates a set of tracefs files to allow user mode interactions. > > + */ > > +static int create_user_tracefs(void) > > +{ > > + struct dentry *edata, *emmap; > > + > > + edata = tracefs_create_file("user_events_data", 0644, NULL, > > + NULL, &user_data_fops); > > BTW, I now define: > > TRACE_MODE_WRITE for files to be written to, and TRACE_MODE_READ for files > that are read only. > > And soon tracefs will honor the gid mount option to define what group all > the tracefs files should belong to on mount. Perfect, thank you. > > -- Steve > Thanks, -Beau