Re: [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Wed, 12 Oct 2022 09:31:09 -0400

On 10/11/22 9:31 AM, Ævar Arnfjörð Bjarmason wrote:

On Mon, Oct 10 2022, Jeff Hostetler wrote:

On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote:
On Thu, Oct 06 2022, Junio C Hamano wrote:

Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:

A cleaned up version of the test code I had on top of "master", RFC
because I may still be missing some context here. E.g. maybe there's a
plan to dynamically construct these thread names?

[...]

I left more extensive commentary in the side-thread in
https://lore.kernel.org/git/221011.86lepmo5dn.gmgdl@xxxxxxxxxxxxxxxxxxx/,
just a quick reply here.

WRT optimizing memory usage.  We're talking about ~25 byte buffer
per thread.  Most commands execute in 1 thread -- if they read the
index they may have ~10 threads (depending on the size of the index
and if preload-index is enabled).  So, I don't think we really need
to optimize this.  Threading is used extensively in fsmonitor-daemon,
but it creates a fixed thread-pool at startup, so it may have ~12
threads.  Again, not worth optimizing for the thread-name field.

Yes, I agree it's not worth optimizing.

The reason for commenting on this part is that it isn't clear to me why
your proposed patch then isn't doing the more obvious "it's not worth
optimizing" pattern, per Junio's [1] comment on the initial version.

The "flex array" method is seemingly taking pains to reduce the runtime
memory use of these by embedding this string in the space reserved for
the struct.

So it's just meant as a question for you & the proposed patch.

I think we're converging on some common understanding (and I
think we've gone around on this topic more than enough).  :-)

I really was just trying to get rid of the strbuf and make it
a fixed string -- I chose a flex-array rather than just detaching
the buffer from a local in the thread-start code.  I should have
done the latter.  I saw the flex-array as a fixed-size object
that can't be replaced or extended (without recreating the
thread-local storage) -- yes, people could overwrite existing
bytes in-place in the flex-array, but who does that??

I understood what you were asking (illustrated in your RFC).
That is, I understood the "what/how" you wanted to do to refactor /
redesign the field, but I couldn't understand the "why".  That
is, why you've taken such interest in this field (and such
a relatively unimportant change).  We've spent nearly a week
discussing it and we both agree that the optimization that I
didn't suggest isn't worth doing.  (I'm paraphrasing slightly.) :-)

So, rather than continuing with the back-n-forth, let me skip
over the remaining questions in this thread and prepare a re-roll.
Hopefully, I can simplify and more clearly explain the method to
my madness and we can move on.

Now, if you want to optimize over all trace2 events (a completely
different topic), you could create a large scratch strbuf buffer in
each thread context and use it so that we don't have to malloc/free
during each trace message.  That might be worth while.

*nod*

I'll make a note to revisit this idea in a future series.

Thanks
Jeff