On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote:
On Thu, Oct 06 2022, Junio C Hamano wrote:
Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
A cleaned up version of the test code I had on top of "master", RFC
because I may still be missing some context here. E.g. maybe there's a
plan to dynamically construct these thread names?
That's nice to learn, indeed.
+void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
+ int thread_id)
+{
+ object_common(jw, "thread");
+ strbuf_addch(&jw->json, '"');
+ jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
+ strbuf_addch(&jw->json, '"');
+}
...
@@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
}
strbuf_addf(buf, "d%d | ", tr2_sid_depth());
- strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
- ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
- event_name);
+ oldlen = buf->len;
+ jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
+ padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
+ strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
+ TR2FMT_PERF_MAX_EVENT_NAME, event_name);
Having to do strbuf_addf() many times may negatively affect perf_*
stuff, if this code is invoked in the hot path. I however tend to
treat anything that involves an I/O not performance critical, and
this certainly falls into that category.
Yes, and that function already called strbuf_addf() 5-7 times, this adds
one more, but only if "thread_id" is > 0.
The reason I added jw_object_string_thread() was to avoid the malloc() &
free() of a temporary "struct strbuf", it would have been more
straightforward to call jw_object_string() like that.
I don't think anyone cares about the raw performance of the "perf"
output, but the "JSON" one needs to be fast(er).
But even that output will malloc()/free() for each line it emits, and
often multiple times within one line (e.g. each time we format a
double).
So if we do want to optimize this in terms of memory use the lowest
hanging fruit seems to be to just have a per-thread "scratch" buffer
we'd write to, we could also observe that we're writing to a file and
just directly write to it in most cases (although we'd need to be
careful to write partial-and-still-invalid JSON lines in that case...).
WRT optimizing memory usage. We're talking about ~25 byte buffer
per thread. Most commands execute in 1 thread -- if they read the
index they may have ~10 threads (depending on the size of the index
and if preload-index is enabled). So, I don't think we really need
to optimize this. Threading is used extensively in fsmonitor-daemon,
but it creates a fixed thread-pool at startup, so it may have ~12
threads. Again, not worth optimizing for the thread-name field.
Now, if you want to optimize over all trace2 events (a completely
different topic), you could create a large scratch strbuf buffer in
each thread context and use it so that we don't have to malloc/free
during each trace message. That might be worth while.
We must not do partial writes to the trace2 files as we're
constructing fields. The trace2 files are opened with O_APPEND
so that we get the atomic lseek(2)+write(2) so that lines get
written without overwrites when multiple threads and/or processes
are tracing.
Also, when writing to a named pipe, we get "message" semantics
on write() boundaries, which makes post-processing easier.
Jeff