Re: [PATCH 6/9] trace2: convert ctx.thread_name to flex array

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Mon, 10 Oct 2022 14:31:37 -0400






On 10/5/22 7:14 AM, Ævar Arnfjörð Bjarmason wrote:

On Tue, Oct 04 2022, Jeff Hostetler via GitGitGadget wrote:

From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>

Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
to a "flex array" at the end of the context structure.

The `thread_name` field is a constant string that is constructed when
the context is created.  Using a (non-const) `strbuf` structure for it
caused some confusion in the past because it implied that someone
could rename a thread after it was created.

I think it's been long enough that we could use a reminder about the
"some confusion", i.e. if it was a bug report or something else.

That usage was not intended.  Changing it to a "flex array" will
hopefully make the intent more clear.

I see we had some back & forth back in the original submission, although
honestly I skimmed this this time around, had forgetten about that, and
had this pop out at me, and then found my earlier comments.

I see that exchange didn't end as well as I'd hoped[1], and hopefully we
can avoid that here. So having looked at this with fresh eyes maybe
these comments/questions help:

  * I'm unable to bridge the cap from (paraphrased) "we must change the
    type" to "mak[ing] the [read-only] intent more clear".

    I.e. if you go across the codebase and look at various non-const
    "char name[FLEX_ARRAY]" and add a "const" to them you'll find cases
    where we re-write the "FLEX_ARRAY" string, e.g. the one in archive.c
    is one of those (the first grep hit, I stopped looking for others at
    that point).

    Making it "const" will yield:
    
       archive.c: In function ‘queue_directory’:
    archive.c:206:29: error: passing argument 1 of ‘xsnprintf’ discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers]
      206 |         d->len = xsnprintf(d->path, len, "%.*s%s/", (int)base->len, base->buf, filename);

    So aside from anything else (and I may be misunderstanding this) why
    does changing it to a FLEX_ARRAY give us the connotation in the
    confused API user's mind that it shouldn't be messed with that the
    "strbuf" doesn't give us?
[...]

My change in how we store the thread-name in the thread context was JUST
to clarify that it should be treated as a constant string and that code
should not try to modify it.  There was a comment to that effect last
year -- that having it be a strbuf invited one to modify it, when that
was not the intent.

That was all I was trying to do here.  Just make it "not be a strbuf".
Perhaps I lept too far by making it a flex-array.  I probably could
have just changed the field to a "char *" and detached it from the
(now local) strbuf.  That would give the same impression, right?


[...]
  	/*
  	 * Implicitly "tr2tls_push_self()" to capture the thread's start
@@ -45,15 +56,6 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
  	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
  	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
  
-	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
-
-	strbuf_init(&ctx->thread_name, 0);
-	if (ctx->thread_id)
-		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, name_hint);
-	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
-		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
-
  	pthread_setspecific(tr2tls_key, ctx);
  
  	return ctx;

I found this quote hard to follow because there's functional changes
there mixed up with code re-arangement, consider leading with a commit
like:
[...]

sorry about that.  yes, there's a bit of churn here because i
needed to reorder the thread-name construction to be before we
allocated the context so that we'd know the buffer size.

and yes, i accidentally mixed in a function change to move the
truncation to the perf backend.

i'll redo all of this.


[...]
<tries it out>

Anyway, if this area was actually performance critical and we *really
cared* about avoiding allocations wouldn't we want to skip both the
"strbuf" there and the "FLEX_ARRAY", and just save away the
"thread_hint" (which the caller hardcodes) and "thread_nr", and then
append on-the-fly?

I came up with the below to do that, it passes all tests, but contains
micro-optimizations that I don't think we need (e.g. I understood you
wanted to avoid printf, so it does that).

But I think it's a useful point of discussion. What test(s) do you have
where the "master" version, FLEX_ARRAY version, and just not strbuf
formatting the thing at all differ?
[...]

none of this was about micro-optimization.  i was just trying to get
the buffer away from a strbuf.  i still want it pre-formatted once
at thread-start, but that's it.

FWIW, I don't think having it formatted in each event helps anything.
it would have to go thru sprintf on every message.  it's much better
to just format it once in the thread-start.


[...]
	diff --git a/json-writer.c b/json-writer.c
[...] 	
	+void jw_strbuf_add_thread_name(struct strbuf *out, const char *thread_hint,
	+			       int thread_id, int max_len)
	+{
[...]
	+}
	+
	+void jw_object_thread(struct json_writer *jw, const char *thread_hint,
	+		      int thread_id)
	+{
[...]
	+}
[...]

We should not do this.  Just format the name in thread-start and
let json-writer print the string as we have been.

Adding thread formatting to json-writer also violates a separation
of concerns.

I'll re-roll this commit completely.

thanks
Jeff