Re: [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 07 Oct 2022 12:03:45 +0200

On Thu, Oct 06 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:
>
>> A cleaned up version of the test code I had on top of "master", RFC
>> because I may still be missing some context here. E.g. maybe there's a
>> plan to dynamically construct these thread names?
>
> That's nice to learn, indeed.
>
>> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
>> +			     int thread_id)
>> +{
>> +	object_common(jw, "thread");
>> +	strbuf_addch(&jw->json, '"');
>> +	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
>> +	strbuf_addch(&jw->json, '"');
>> +}
>
> ...
>
>> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>>  	}
>>  
>>  	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
>> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
>> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
>> -		    event_name);
>> +	oldlen = buf->len;
>> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
>> +	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
>> +	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
>> +		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
>
> Having to do strbuf_addf() many times may negatively affect perf_*
> stuff, if this code is invoked in the hot path.  I however tend to
> treat anything that involves an I/O not performance critical, and
> this certainly falls into that category.

Yes, and that function already called strbuf_addf() 5-7 times, this adds
one more, but only if "thread_id" is > 0.

The reason I added jw_object_string_thread() was to avoid the malloc() &
free() of a temporary "struct strbuf", it would have been more
straightforward to call jw_object_string() like that.

I don't think anyone cares about the raw performance of the "perf"
output, but the "JSON" one needs to be fast(er).

But even that output will malloc()/free() for each line it emits, and
often multiple times within one line (e.g. each time we format a
double).

So if we do want to optimize this in terms of memory use the lowest
hanging fruit seems to be to just have a per-thread "scratch" buffer
we'd write to, we could also observe that we're writing to a file and
just directly write to it in most cases (although we'd need to be
careful to write partial-and-still-invalid JSON lines in that case...).