On Tue, Aug 03, 2021 at 08:17:18PM -0400, Steven Rostedt wrote: > On Tue, 3 Aug 2021 15:52:00 -0700 > Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote: > > > For clarity, would you like a resend with the user mode code in the > > description or would you like an in-thread example? > > In thread example is fine. I'd just like to understand what exactly you > plan on doing with it. > > -- Steve Internally we have single trace_events that actually represent many events. We do this by having the payload start with an int that the eBPF programs always probes first to get a sub-event ID. The sub-event ID is then used to determine the actual payload format. We typically use this pattern to enable a single eBPF program to turn on a class of events that require tracing. An example is tracing out all the network related errors, of which there may be many events all with different payloads. IE: struct packeterror { int id; int packetnumber; int errorcode; }; struct connerror { int id; int connnumber; int ip4; int errorcode; }; Both packeterror and connerror would be output to a trace_event with a name like ms.net.errors. packeterror might have an id of 0 while connerror would have an id of 1. eBPF or the trace decoder would check the first int and do further decoding on the payload (or skip it). udiag sends user data as an eBPF context struct, so user probing costs are delayed until the program sees something that warrants it. This limits system cost when tracing is enabled but only a subset of the events are wanted. This pattern makes writing an eBPF program that is revolved around common data much easier. Internally we have macros that translate to the page check followed by the write syscall. These macros are also used post compile to auto-generate eBPF probe statements, etc. to make it harder for developers to get the code wrong on either side. Here's a simple example showing the general working order and flow we use on top of udiag. The payload in this case is very simple (a single int). Payloads in general can be anything, the kernel side doesn't force a user to use the sub-event models. It works great for us, but some other users might not require it or have their ideas. #include <unistd.h> #include <stdio.h> #include <sys/mman.h> #include <sys/ioctl.h> #include <errno.h> #include <fcntl.h> #define DIAG_IOC_MAGIC '*' #define DIAG_IOCSREG _IOW(DIAG_IOC_MAGIC, 0, char*) #define DIAG_IOCSDEL _IOW(DIAG_IOC_MAGIC, 1, char*) int udiag_init(char **eventpage) { int ret; int page_size = sysconf(_SC_PAGESIZE); int fd = open("/dev/udiag", O_RDWR); if (fd == -1) { ret = errno; goto out; } /* Map in single page for event enabled checking */ char *page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); if (page != MAP_FAILED) { *eventpage = page; ret = 0; } else { ret = errno; } close(fd); out: return ret; } int udiag_event_open(char *name, int *eventfd, int *eventindex) { int ret; int fd; long status; fd = open("/dev/udiag", O_RDWR); if (fd == -1) { ret = errno; goto out; } /* Make data written to this fd log under the named event */ status = ioctl(fd, DIAG_IOCSREG, name); if (status < 0) { ret = errno; close(fd); goto out; } /* Give caller back an fd for writing to this event */ *eventfd = fd; /* Give caller back the index of the page to check if enabled */ *eventindex = (int)status; ret = 0; out: return ret; } int main(int argc, char *argv[]) { int err, fd, index, payload; char *page; err = udiag_init(&page); if (err != 0) { return err; } err = udiag_event_open("testevent", &fd, &index); if (err != 0) { return err; } payload = 0; printf("Press enter to write a testevent\n"); fetch: getchar(); /* Avoid write overhead when nothing is listening */ if (page[index]) { payload++; write(fd, &payload, sizeof(payload)); printf("Logged %d", payload); } else { printf("Event is not being traced currently, enable via:\n"); printf("echo 1 > /sys/kernel/debug/tracing/events/udiag/testevent/enable"); } goto fetch; return 0; } Thanks, -Beau