RE: Format of trace.dat flyrecord section

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Tzvetomir,
Thanks for your answer. Everything is clear to me now.

In the rest of this mail, I'd like to share to the community some considerations about how to structure the fly record section in a trace.dat file.
This resume is not complete at all: it only includes points that were not clear to me before exchanging with Mr. Stoyanov and Mr. Rostedt,. Maybe some people may also find it useful.

- In a Flyrecord section (ID: 3),  data are divided per CPU. After the section's header, a padding must be present to ensure the real data to start at a offset that is page-aligned (e.g., 4096 is a common situation).
- Let's imagine that, for CPU 0, data starts at offset 4096 of the trace.dat
- Real tracing data are divided in "pages"/buffers of 4096 bytes.  Each page starts with a header, whose format is explicated in "header_page" file. 

	field: u64 timestamp;	offset:0;	size:8;	signed:1;
	field: local_t commit;	offset:8;	size:8;	signed:1;
	field: int overwrite;	offset:8;	size:1;	signed:1;
	field: char data;	offset:16;	size:4080;	signed:1;

- In header_page, pay attention that fields `commit` and `overwrite` are mutually exclusive (same offset!).
- In header_page, timestamp is a 64-bit number that represent the base timestamp for all the entries of that page. Let's suppose that we write 9000000000 (9 seconds) on it: this `9000000000` will not eventually displayed on kernelshark. It's just a starting point. "data" is the buffer in which the real entries (several) will be written.
- Then, it's time to write the real entries (also named "events". Entries are composed of 2 parts: 1) a header  and 2) rest of the data.
- Events' header is explained in the file "header_event". This is the same as Linux's ringbuffer format: https://elixir.bootlin.com/linux/latest/source/include/linux/ring_buffer.h#15 ->

# compressed entry header
	type_len    :    5 bits
	time_delta  :   27 bits
	array       :   32 bits

	padding     : type == 29
	time_extend : type == 30
	time_stamp : type == 31
	data max type_len  == 28

- You pass both headers files within the section 16.
- Events'/Entries' header only explicit the timestamp of the entry. I will not explain it since it's very clear by previous exchanges with Mr. Stoyanov and Mr. Rostedt and from Linux source code. However, let me clarify some magic points.
- Let's take the base of  9 seconds. We want to add 20 ms to it. In this case, the reasoning shall be: "9s + 20 ms is represented on 24 bytes (6*4). So, I need to write 6 on the first 5 bits (type_len), 20000000 on the remaining 27 bits (time delta)" (for a total of 4 bytes) AND NOT WRITING the 32 bits of array: I don't use it, so I don't write it. Note: 6 bytes are required to contain the value of the BASE (of 9s) + the delta (of 20 ms). In this way, kernelshark will display 9.02 s.
- If we need to add another entry, keep in mind that the time delta is respect to the previous entry, and not the base anymore.
- At this point we understood how to write the timestamps, but a trace is composed of many other fields: PID, function address, caller, etc.
- First of all, we need to pick a format. Naturally we can write our own format, but, if you want to display this trace with kernelshark without writing kernelshark plugins, you need to respect one of the existing formats. For instance, let's suppose to use "function tracer" of ftrace. In debugfs, look for: /sys/kernel/debug/tracing/events/ftrace/function/format

ID: 1
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:unsigned long ip;	offset:8;	size:8;	signed:0;
	field:unsigned long parent_ip;	offset:16;	size:8;	signed:0;

print fmt: " %ps <-- %ps", (void *)REC->ip, (void *)REC->parent_ip

- If you didn't already do it, you need to pass this format to the trace.dat through section 18.
- common_type is the ID of the event (notice: for the previous event, it's 1, as we can notice at the beginning of the file)
- common_flags and common_preempt_count impacts the field "latency" of kernelshark.
- common_pid is, indeed, the PID
- ip and parent_ip are the trace function's address and his caller.
- The final print will impact the field "info" of kernelshark.

Hopefully this will be useful for the ftrace-users community.
Bests,
Matteo

-----Original Message-----
From: Tzvetomir Stoyanov [mailto:tz.stoyanov@xxxxxxxxx] 
Sent: Friday, June 17, 2022 5:26 PM
To: Matteo Bertolino <matteo.bertolino@xxxxxxxxxx>
Cc: linux-trace-users@xxxxxxxxxxxxxxx; Steven Rostedt <rostedt@xxxxxxxxxxx>
Subject: Re: Format of trace.dat flyrecord section

On Fri, Jun 17, 2022 at 4:08 PM Matteo Bertolino <matteo.bertolino@xxxxxxxxxx> wrote:
>
> Dear Mr Rostedt and mr Stoyanov,
> thanks for your warming welcoming and for your answers.
>
> I had clearer ideas now. If possible, I would like to ask some clarifications.
> Once everything is clear in my mind, I will write a resume with an example for the mailing list, so it can be usefulf for future experiments.
>
> Let's suppose that I want to write some events like the "function tracer":
> Thanks to your explanation, I successfully communicated in the header this format. This is confirmed by:
>
> ```
> dreamer@dreamer-VB:~/Desktop/mydotdat/input$ trace-cmd dump --events -i ../mytrace_v7.dat
>         [Section 18 @ 617: "Unknown", flags 0x0, 485 bytes]
>         [Events format, 1 systems]
> name: function
> ID: 1
> format:
>         field:unsigned short common_type;       offset:0;       size:2; signed:0;
>         field:unsigned char common_flags;       offset:2;       size:1; signed:0;
>         field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
>         field:int common_pid;   offset:4;       size:4; signed:1;
>
>         field:unsigned long ip; offset:8;       size:8; signed:0;
>         field:unsigned long parent_ip;  offset:16;      size:8; signed:0;
>
> print fmt: " %ps <-- %ps", (void *)REC->ip, (void *)REC->parent_ip ```
>
> Now, in my trace.dat, I want to write the real information.
> If you permit me, I'd like to continue this discussion with a concrete example.
>
> Let's suppose that I want to represent the following information:
>
> Event: function trace event
> Timestamp 9 seconds
> PID: 7777
> Function address: 0x2266D
> Parent: 0x2266E
>
> I start by filling the header_event information:
> - Writing 900000000000 on 8 bytes (timestamp).
> - Writing `24`in 8 bytes (commit) --> event data for event ID are on 24 bytes, indeed: common_* are on 8 bytes, ip and parent ip on 8.
> - Ignoring overwrite: indeed, I already used commit.
>
> Then, I have the 4080 bytes of payload event, but I use only 24 of them, as expressed in commit.
> - common_type is 1, (2 bytes), because I want a function event
> - common_flags is 0 (1 byte)
> - common_preempt_count is 0 (1 byte)
> - common_pid is 7777 (4 bytes)
> - ip is 0x2266D (8 bytes)
> - parent_ip is 0x2266E (8 bytes)
> - the remaining 4096 - 24 bytes are of padding, so I can complete the 4080 bytes of data.
>
> Unfortunately, in this way, only CPU number and timestamp are correctly displayed on the frontend (kernelshark).
> Event is marked as "unknown" and PID is wrong.
> Timestamp astonishes me too: to correctly display it, I just need to write 900000000000 on 8 bytes.
> Trying to put "1" on time_type (5bits), 0 on delta (27 bits) and 900000000000 on array (1*4 bytes), the timestamp is not correct.

Hi Matteo,
The 'type_len' should be 6 for your event, as the payload is 24 bytes (6*4). The 'time_delta' is the delta from the timestamp written in the page header, i.e. if you put 199 in 'time_delta' the time stamp of the event is 900000000199. One very important note - take care of the endian, used for all numbers in the file. The endian of the file can be different from the endian of the host, where the file is displayed.
That's why that information is written in the trace.dat - in the very beginning of the file.

>
> Thus, I think I am doing something wrong. May you help me in correcting this example, please?
>
> Another thing that did let me with a doubt is the following one: do I need a page (4096 bytes) for an event?
> In other words: the 4096-24 bytes of padding to complete the 4080 bytes of "data" are mandatory to express another event?
>
> Thanks for your help,
> Matteo
>
> -----Original Message-----
> From: Steven Rostedt [mailto:rostedt@xxxxxxxxxxx]
> Sent: Thursday, June 16, 2022 3:04 PM
> To: Matteo Bertolino <matteo.bertolino@xxxxxxxxxx>
> Cc: linux-trace-users@xxxxxxxxxxxxxxx
> Subject: Re: Format of trace.dat flyrecord section
>
>
> OK, this is a claws-mail bug. It added the '\' in the email address when doing a reply all.
>
> -- Steve
>
>
> On Thu, 16 Jun 2022 09:02:37 -0400
> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> > This is just a test to see if the added '\' was due to a claws-mail 
> > bug or somehow I did it.
> >
> > -- Steve
>


--
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center




[Index of Archives]     [Linux USB Development]     [Linux USB Development]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux