On Fri, 18 Feb 2022 13:45:44 -0500 Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > Hello! > > I am seeing something weird with trace instances, that I don't see > with regular non-instance traces. > > If I enable some sched events and set the buffer size to something > tiny, like 1KB, I am expecting the buffer to rotate quickly as it > overflows, but I don't see that: > > watch "cat trace | head -n 1" and I see the following > > https://paste.googleplex.com/6259651512369152?raw > > and if I do some heavy workloads, I still see that the top rows are > not moving and are stuck showing the same thing. I am expecting the > buffer to drop old events as it is being overwritten/rotation. > > However if I do a tail on the trace, I do see updates. So tracing is > in progress... > > I did the exact same experiment on non-instance global tracing and it > works fine. I confirmed that 'overwrite' is set in the trace_options > of the instance (and non-instance). > > This could be some trace formatting issue or another bug? Note I am > reading the tracefs trace file directly and not using trace-cmd for > this. It's not a bug it's a feature ;-) The "trace" file is not a consumer file. That is, it does not clear out the ring buffer when you read it. If you want that, use trace_pipe. What you are seeing is the effect of the implementation of the ring buffer. As the ring buffer is broken up into "sub buffers" which are currently a page in size. Although Tzvetomir has RFC patches to change that: https://patchwork.kernel.org/project/linux-trace-devel/list/?series=594661&state=* If you pause tracing (echo 0 > tracing_on) you can read the trace file over and over and still get the same result. That is because it uses an iterator to read over the ring buffer and does not modify it. But if you were to use trace_pipe, it consumes the tracing data, and it does so by swapping out the sub buffers from the ring buffer and giving it to the user. Now to be able to do things like splice, and zero copy from the ring buffer, when you read the ring buffer it swaps out a sub buffer from the writable portion of the ring buffer and makes it into the "reader_page". The reader page is now own by the reader, and will never be written to. This allows the reader to take its time to read it and not worry about being overwritten by the writer. Now what you are seeing, is that a reader page was swapped out of the ring buffer. This means that it will never be written to again, as it belongs to the reader (it's the reader page). And since you are using the "trace" file and not the "trace_pipe" file, it will not even discard this page. But as you have not turned off tracing, the rest of the ring buffer is still being updated. What happens is, the start of the trace file will show you the reader page (never be updated, and will always stay the same), and then walk into the writable section of the ring buffer, which will constantly be updated. This gives you exactly what you are seeing. Like I said, it's a feature! Now, why is it not happening in the top level file? It's because if the reader page has not be read yet, the iterator may go right into the writable portion and skip over the reader page as it is empty. If you did something that causes the reader page to be filled, it will give you what you see. So this isn't a difference between the instances and top level, it's a difference between what you have have done to the instances before reading the trace file, which you did not do for the top level buffer. They both use the same code, so they should be acting identically. -- Steve