Hi Beau, Thanks for updating. This series looks good to me. Acked-by: Masami Hiramatsu <mhiramat@xxxxxxxxxx> for this series. Regards, On Tue, 18 Jan 2022 12:43:14 -0800 Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote: > User mode processes that wish to use trace events to get data into > ftrace, perf, eBPF, etc are limited to uprobes today. The user events > features enables an ABI for user mode processes to create and write to > trace events that are isolated from kernel level trace events. This > enables a faster path for tracing from user mode data as well as opens > managed code to participate in trace events, where stub locations are > dynamic. > > User processes often want to trace only when it's useful. To enable this > a set of pages are mapped into the user process space that indicate the > current state of the user events that have been registered. User > processes can check if their event is hooked to a trace/probe, and if it > is, emit the event data out via the write() syscall. > > Two new files are introduced into tracefs to accomplish this: > user_events_status - This file is mmap'd into participating user mode > processes to indicate event status. > > user_events_data - This file is opened and register/delete ioctl's are > issued to create/open/delete trace events that can be used for tracing. > > The typical scenario is on process start to mmap user_events_status. Processes > then register the events they plan to use via the REG ioctl. The ioctl reads > and updates the passed in user_reg struct. The status_index of the struct is > used to know the byte in the status page to check for that event. The > write_index of the struct is used to describe that event when writing out to > the fd that was used for the ioctl call. The data must always include this > index first when writing out data for an event. Data can be written either by > write() or by writev(). > > For example, in memory: > int index; > char data[]; > > Psuedo code example of typical usage: > struct user_reg reg; > > int page_fd = open("user_events_status", O_RDWR); > char *page_data = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, page_fd, 0); > close(page_fd); > > int data_fd = open("user_events_data", O_RDWR); > > reg.size = sizeof(reg); > reg.name_args = (__u64)"test"; > > ioctl(data_fd, DIAG_IOCSREG, ®); > int status_id = reg.status_index; > int write_id = reg.write_index; > > struct iovec io[2]; > io[0].iov_base = &write_id; > io[0].iov_len = sizeof(write_id); > io[1].iov_base = payload; > io[1].iov_len = sizeof(payload); > > if (page_data[status_id]) > writev(data_fd, io, 2); > > User events are also exposed via the dynamic_events tracefs file for > both create and delete. Current status is exposed via the user_events_status > tracefs file. > > Simple example to register a user event via dynamic_events: > echo u:test >> dynamic_events > cat dynamic_events > u:test > > If an event is hooked to a probe, the probe hooked shows up: > echo 1 > events/user_events/test/enable > cat user_events_status > 1:test # Used by ftrace > > Active: 1 > Busy: 1 > Max: 4096 > > If an event is not hooked to a probe, no probe status shows up: > echo 0 > events/user_events/test/enable > cat user_events_status > 1:test > > Active: 1 > Busy: 0 > Max: 4096 > > Users can describe the trace event format via the following format: > name[:FLAG1[,FLAG2...] [field1[;field2...]] > > Each field has the following format: > type name > > Example for char array with a size of 20 named msg: > echo 'u:detailed char[20] msg' >> dynamic_events > cat dynamic_events > u:detailed char[20] msg > > Data offsets are based on the data written out via write() and will be > updated to reflect the correct offset in the trace_event fields. For dynamic > data it is recommended to use the new __rel_loc data type. This type will be > the same as __data_loc, but the offset is relative to this entry. This allows > user_events to not worry about what common fields are being inserted before > the data. > > The above format is valid for both the ioctl and the dynamic_events file. > > V2: > Fixed kmalloc vs kzalloc for register_page. > Renamed user_event_mmap to user_event_status. > Renamed user_event prefix from ue to u. > Added seq_* operations to user_event_status to enable cat output. > Aligned field parsing to synth_events format (+ size specifier for > custom/user types). > Added uapi header user_events.h to align kernel and user ABI definitions. > > V3: > Updated ABI to handle single FD into many events via an int header. > Added iovec/writev support to enable int header without payload changes. > Updated bpf context to describe if data is coming from user, kernel or > raw iovec. > Added flag support for registering event, allows forcing BPF to always > recieve the direct iovecs for sensitive code paths that do not want > copies. > > V4: > Moved to struct user_reg for registering events via ioctl. > Added unit tests for ftrace, dyn_events and perf integration. > Added print_fmt generation and proper dyn_events matching statements. > Reduced time in preemption disabled paths. > Added documentation file. > Pre-fault in data when preemption is enabled and use no-fault copy in probes. > Fixed MIPs missing PAGE_READONLY define. > > V5: > Rebase to linux-trace for-next branch. > Added sample code into samples/user_events. > Switched to str_has_prefix in various locations. > Allow hex in array sizes and ensure reasonable sizes are used. > Moved lifetime of name buffer when parsing to the caller for failure paths. > Fixed documentation nits and index. > Ensure event isn't busy before freeing through dyn_events. > Properly handle failure case for ftrace and perf in fault cases for buffers. > Ensure write data is over min size and null terminated for dynamic arrays. > > V6: > Fixed endian issue with dyn loc decoding (use u32). > Fixed size_t conversion warning on hexagon arch (min vs min_t). > Handle cases for __get_str vs __get_rel_str in print_fmt generation. > Add additional comments around various event member lifetimes. > Reduced max field array size to 1K. > > V7: > Acquire reg_mutex during release, ensure refs cannot change under any situation. > Remove default n from Kconfig. > Move from static 0644 mode to TRACE_MODE_WRITE. > > V8: > Squashed UABI header into ftrace minimal patch thread. > Moved pagefault_disable/enable into copy_nofault. > Moved to strscpy vs custom copy when getting array size from type. > Made patch bisect friendly by ensuring tests are split from kernel code. > > V9: > Rebase to linux-trace ftrace/core branch. > Added comments for user_reg and other structs in user_events.h. > Moved from delayed seq_file to pre-created seq_file for status file. > Added deleting events to documentation and expanded registering section. > Reordered patches to make reviewing easier. > Fixed nitpicks. > > V10: > Fix struct size case not writing size out to dynamic_events. > Fix warning for NULL pointer arithmetic in user_seq_start. > > Beau Belgrave (12): > user_events: Add minimal support for trace_event into ftrace > user_events: Add print_fmt generation support for basic types > user_events: Handle matching arguments from dyn_events > user_events: Add basic perf and eBPF support > user_events: Optimize writing events by only copying data once > user_events: Validate user payloads for size and null termination > user_events: Add self-test for ftrace integration > user_events: Add self-test for dynamic_events integration > user_events: Add self-test for perf_event integration > user_events: Add self-test for validator boundaries > user_events: Add sample code for typical usage > user_events: Add documentation file > > Documentation/trace/index.rst | 1 + > Documentation/trace/user_events.rst | 216 +++ > include/uapi/linux/user_events.h | 116 ++ > kernel/trace/Kconfig | 14 + > kernel/trace/Makefile | 1 + > kernel/trace/trace_events_user.c | 1617 +++++++++++++++++ > samples/user_events/Makefile | 5 + > samples/user_events/example.c | 91 + > tools/testing/selftests/user_events/Makefile | 9 + > .../testing/selftests/user_events/dyn_test.c | 130 ++ > .../selftests/user_events/ftrace_test.c | 452 +++++ > .../testing/selftests/user_events/perf_test.c | 168 ++ > tools/testing/selftests/user_events/settings | 1 + > 13 files changed, 2821 insertions(+) > create mode 100644 Documentation/trace/user_events.rst > create mode 100644 include/uapi/linux/user_events.h > create mode 100644 kernel/trace/trace_events_user.c > create mode 100644 samples/user_events/Makefile > create mode 100644 samples/user_events/example.c > create mode 100644 tools/testing/selftests/user_events/Makefile > create mode 100644 tools/testing/selftests/user_events/dyn_test.c > create mode 100644 tools/testing/selftests/user_events/ftrace_test.c > create mode 100644 tools/testing/selftests/user_events/perf_test.c > create mode 100644 tools/testing/selftests/user_events/settings > > > base-commit: 85c62c8c3749eec02ba81217bdcac26867dc262e > -- > 2.17.1 > -- Masami Hiramatsu <mhiramat@xxxxxxxxxx>