Re: [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Mon, 10 Oct 2022 15:16:45 -0400






On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote:

On Thu, Oct 06 2022, Junio C Hamano wrote:

Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:

A cleaned up version of the test code I had on top of "master", RFC
because I may still be missing some context here. E.g. maybe there's a
plan to dynamically construct these thread names?

That's nice to learn, indeed.

+void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
+			     int thread_id)
+{
+	object_common(jw, "thread");
+	strbuf_addch(&jw->json, '"');
+	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
+	strbuf_addch(&jw->json, '"');
+}

...

@@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
  	}
  
  	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
-	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
-		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
-		    event_name);
+	oldlen = buf->len;
+	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
+	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
+	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
+		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);

Having to do strbuf_addf() many times may negatively affect perf_*
stuff, if this code is invoked in the hot path.  I however tend to
treat anything that involves an I/O not performance critical, and
this certainly falls into that category.

Yes, and that function already called strbuf_addf() 5-7 times, this adds
one more, but only if "thread_id" is > 0.

The reason I added jw_object_string_thread() was to avoid the malloc() &
free() of a temporary "struct strbuf", it would have been more
straightforward to call jw_object_string() like that.

I don't think anyone cares about the raw performance of the "perf"
output, but the "JSON" one needs to be fast(er).

But even that output will malloc()/free() for each line it emits, and
often multiple times within one line (e.g. each time we format a
double).

So if we do want to optimize this in terms of memory use the lowest
hanging fruit seems to be to just have a per-thread "scratch" buffer
we'd write to, we could also observe that we're writing to a file and
just directly write to it in most cases (although we'd need to be
careful to write partial-and-still-invalid JSON lines in that case...).


WRT optimizing memory usage.  We're talking about ~25 byte buffer
per thread.  Most commands execute in 1 thread -- if they read the
index they may have ~10 threads (depending on the size of the index
and if preload-index is enabled).  So, I don't think we really need
to optimize this.  Threading is used extensively in fsmonitor-daemon,
but it creates a fixed thread-pool at startup, so it may have ~12
threads.  Again, not worth optimizing for the thread-name field.

Now, if you want to optimize over all trace2 events (a completely
different topic), you could create a large scratch strbuf buffer in
each thread context and use it so that we don't have to malloc/free
during each trace message.  That might be worth while.


We must not do partial writes to the trace2 files as we're
constructing fields.  The trace2 files are opened with O_APPEND
so that we get the atomic lseek(2)+write(2) so that lines get
written without overwrites when multiple threads and/or processes
are tracing.

Also, when writing to a named pipe, we get "message" semantics
on write() boundaries, which makes post-processing easier.

Jeff