On Mon, 2015-01-05 at 13:12 +0000, Peter Zijlstra wrote: > On Thu, Nov 06, 2014 at 04:51:57PM +0000, Pawel Moll wrote: > > This patch adds a PR_TASK_PERF_UEVENT prctl call which can be used by > > any process to inject custom data into perf data stream as a new > > PERF_RECORD_UEVENT record, if such process is being observed or if it > > is running on a CPU being observed by the perf framework. > > > > The prctl call takes the following arguments: > > > > prctl(PR_TASK_PERF_UEVENT, type, size, data, flags); > > > > - type: a number meaning to describe content of the following data. > > Kernel does not pay attention to it and merely passes it further in > > the perf data, therefore its use must be agreed between the events > > producer (the process being observed) and the consumer (performance > > analysis tool). The perf userspace tool will contain a repository of > > "well known" types and reference implementation of their decoders. > > - size: Length in bytes of the data. > > - data: Pointer to the data. > > - flags: Reserved for future use. Always pass zero. > > > > Perf context that are supposed to receive events generated with the > > prctl above must be opened with perf_event_attr.uevent set to 1. The > > PERF_RECORD_UEVENT records consist of a standard perf event header, > > 32-bit type value, 32-bit data size and the data itself, followed by > > padding to align the overall record size to 8 bytes and optional, > > standard sample_id field. > > > > Example use cases: > > > > - "perf_printf" like mechanism to add logging messages to perf data; > > in the simplest case it can be just > > > > prctl(PR_TASK_PERF_UEVENT, 0, 8, "Message", 0); > > > > - synchronisation of performance data generated in user space with the > > perf stream coming from the kernel. For example, the marker can be > > inserted by a JIT engine after it generated portion of the code, but > > before the code is executed for the first time, allowing the > > post-processor to pick the correct debugging information. > > The think I remember being raised was a unified means of these msgs > across perf/ftrace/lttng. I am not seeing that mentioned. Right. I was considering the "well known types repository" an attempt in this direction. Having said that - ftrace also takes a random blob as the trace marker, so the unification has to happen in userspace anyway. I'll have a look what LTTng has to say in this respect. > Also, I would like a stronger rationale for the @type argument, if it > has no actual meaning why is it separate from the binary msg data? Valid point. Without type 0 defined as a string, it doesn't bring anything into the equation. I just have a gut feeling that sooner than later we will want to split the messages somehow. Maybe we should make it a "reserved for future use, use 0 now" field? * struct { * struct perf_event_header header; * u32 __reserved; /* always 0 */ * u32 size; * char data[size]; * char __padding[-size & 7]; * struct sample_id sample_id; * }; or, probably even better, make it a version value at a known offset (currently always 1, with just size and random sized data following). * struct { * struct perf_event_header header; * u32 version; /* use 1 */ * u32 size; * char data[size]; * char __padding[-size & 7]; * struct sample_id sample_id; * }; So that we can mutate the user events format without too much of the pain - the parsers will simply complain about unknown format if such occurs and with the size of the record in the header, it is possible to skip it. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html