Re: [RFC PATCH] udiag - User mode to trace_event (ftrace, perf, eBPF) ABI

Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> · Wed, 4 Aug 2021 09:37:51 -0700

On Tue, Aug 03, 2021 at 08:17:18PM -0400, Steven Rostedt wrote:
> On Tue, 3 Aug 2021 15:52:00 -0700
> Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote:
> 
> > For clarity, would you like a resend with the user mode code in the
> > description or would you like an in-thread example?
> 
> In thread example is fine. I'd just like to understand what exactly you
> plan on doing with it.
> 
> -- Steve
Internally we have single trace_events that actually represent many
events. We do this by having the payload start with an int that the eBPF
programs always probes first to get a sub-event ID. The sub-event ID is
then used to determine the actual payload format. We typically use this
pattern to enable a single eBPF program to turn on a class of events
that require tracing. An example is tracing out all the network related
errors, of which there may be many events all with different payloads.

IE:
struct packeterror {
	int id;
	int packetnumber;
	int errorcode;
};

struct connerror {
	int id;
	int connnumber;
	int ip4;
	int errorcode;
};

Both packeterror and connerror would be output to a trace_event with
a name like ms.net.errors. packeterror might have an id of 0 while
connerror would have an id of 1. eBPF or the trace decoder would check
the first int and do further decoding on the payload (or skip it). udiag
sends user data as an eBPF context struct, so user probing costs are
delayed until the program sees something that warrants it. This limits
system cost when tracing is enabled but only a subset of the events are
wanted.

This pattern makes writing an eBPF program that is revolved around
common data much easier. Internally we have macros that translate to the
page check followed by the write syscall. These macros are also used
post compile to auto-generate eBPF probe statements, etc. to make it
harder for developers to get the code wrong on either side.

Here's a simple example showing the general working order and flow we
use on top of udiag. The payload in this case is very simple (a single
int). Payloads in general can be anything, the kernel side doesn't force
a user to use the sub-event models. It works great for us, but some
other users might not require it or have their ideas.

#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <errno.h>
#include <fcntl.h>

#define DIAG_IOC_MAGIC '*'
#define DIAG_IOCSREG _IOW(DIAG_IOC_MAGIC, 0, char*)
#define DIAG_IOCSDEL _IOW(DIAG_IOC_MAGIC, 1, char*)

int udiag_init(char **eventpage)
{
	int ret;
	int page_size = sysconf(_SC_PAGESIZE);
	int fd = open("/dev/udiag", O_RDWR);

	if (fd == -1) {
		ret = errno;
		goto out;
	}

	/* Map in single page for event enabled checking */
	char *page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);

	if (page != MAP_FAILED) {
		*eventpage = page;
		ret = 0;
	} else {
		ret = errno;
	}

	close(fd);

out:
	return ret;
}

int udiag_event_open(char *name, int *eventfd, int *eventindex)
{
	int ret;
	int fd;
	long status;

	fd = open("/dev/udiag", O_RDWR);

	if (fd == -1) {
		ret = errno;
		goto out;
	}

	/* Make data written to this fd log under the named event */
	status = ioctl(fd, DIAG_IOCSREG, name);

	if (status < 0) {
		ret = errno;
		close(fd);
		goto out;
	}

	/* Give caller back an fd for writing to this event */
	*eventfd = fd;

	/* Give caller back the index of the page to check if enabled */
	*eventindex = (int)status;

	ret = 0;

out:
	return ret;
}

int main(int argc, char *argv[])
{
	int err, fd, index, payload;
	char *page;

	err = udiag_init(&page);

	if (err != 0) {
		return err;
	}

	err = udiag_event_open("testevent", &fd, &index);

	if (err != 0) {
		return err;
	}

	payload = 0;

	printf("Press enter to write a testevent\n");
fetch:
	getchar();

	/* Avoid write overhead when nothing is listening */
	if (page[index]) {
		payload++;
		write(fd, &payload, sizeof(payload));
		printf("Logged %d", payload);
	} else {
		printf("Event is not being traced currently, enable via:\n");
		printf("echo 1 > /sys/kernel/debug/tracing/events/udiag/testevent/enable");
	}

	goto fetch;

	return 0;
}

Thanks,
-Beau