Re: [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 10 2022, Jeff Hostetler wrote:

> On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Oct 06 2022, Junio C Hamano wrote:
>> 
>>> Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:
>>>
>>>> A cleaned up version of the test code I had on top of "master", RFC
>>>> because I may still be missing some context here. E.g. maybe there's a
>>>> plan to dynamically construct these thread names?
>>>
>>> That's nice to learn, indeed.
>>>
>>>> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
>>>> +			     int thread_id)
>>>> +{
>>>> +	object_common(jw, "thread");
>>>> +	strbuf_addch(&jw->json, '"');
>>>> +	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
>>>> +	strbuf_addch(&jw->json, '"');
>>>> +}
>>>
>>> ...
>>>
>>>> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>>>>   	}
>>>>     	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
>>>> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
>>>> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
>>>> -		    event_name);
>>>> +	oldlen = buf->len;
>>>> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
>>>> +	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
>>>> +	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
>>>> +		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
>>>
>>> Having to do strbuf_addf() many times may negatively affect perf_*
>>> stuff, if this code is invoked in the hot path.  I however tend to
>>> treat anything that involves an I/O not performance critical, and
>>> this certainly falls into that category.
>> Yes, and that function already called strbuf_addf() 5-7 times, this
>> adds
>> one more, but only if "thread_id" is > 0.
>> The reason I added jw_object_string_thread() was to avoid the
>> malloc() &
>> free() of a temporary "struct strbuf", it would have been more
>> straightforward to call jw_object_string() like that.
>> I don't think anyone cares about the raw performance of the "perf"
>> output, but the "JSON" one needs to be fast(er).
>> But even that output will malloc()/free() for each line it emits,
>> and
>> often multiple times within one line (e.g. each time we format a
>> double).
>> So if we do want to optimize this in terms of memory use the lowest
>> hanging fruit seems to be to just have a per-thread "scratch" buffer
>> we'd write to, we could also observe that we're writing to a file and
>> just directly write to it in most cases (although we'd need to be
>> careful to write partial-and-still-invalid JSON lines in that case...).
>> 

I left more extensive commentary in the side-thread in
https://lore.kernel.org/git/221011.86lepmo5dn.gmgdl@xxxxxxxxxxxxxxxxxxx/,
just a quick reply here.

> WRT optimizing memory usage.  We're talking about ~25 byte buffer
> per thread.  Most commands execute in 1 thread -- if they read the
> index they may have ~10 threads (depending on the size of the index
> and if preload-index is enabled).  So, I don't think we really need
> to optimize this.  Threading is used extensively in fsmonitor-daemon,
> but it creates a fixed thread-pool at startup, so it may have ~12
> threads.  Again, not worth optimizing for the thread-name field.

Yes, I agree it's not worth optimizing.

The reason for commenting on this part is that it isn't clear to me why
your proposed patch then isn't doing the more obvious "it's not worth
optimizing" pattern, per Junio's [1] comment on the initial version.

The "flex array" method is seemingly taking pains to reduce the runtime
memory use of these by embedding this string in the space reserved for
the struct.

So it's just meant as a question for you & the proposed patch.

> Now, if you want to optimize over all trace2 events (a completely
> different topic), you could create a large scratch strbuf buffer in
> each thread context and use it so that we don't have to malloc/free
> during each trace message.  That might be worth while.

*nod*

> We must not do partial writes to the trace2 files as we're
> constructing fields.  The trace2 files are opened with O_APPEND
> so that we get the atomic lseek(2)+write(2) so that lines get
> written without overwrites when multiple threads and/or processes
> are tracing.
>
> Also, when writing to a named pipe, we get "message" semantics
> on write() boundaries, which makes post-processing easier.

*nod*

1. https://lore.kernel.org/git/xmqq8rwcjttq.fsf@gitster.g/
2. https://lore.kernel.org/git/RFC-patch-1.1-8563d017137-20221007T010829Z-avarab@xxxxxxxxx/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux