Re: [PATCH] trace2: log progress time and throughput

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/12/2020 5:44 PM, Emily Shaffer wrote:
> Rather than teaching only one operation, like 'git fetch', how to write
> down throughput to traces, we can learn about a wide range of user
> operations that may seem slow by adding tooling to the progress library
> itself. Operations which display progress are likely to be slow-running
> and the kind of thing we want to monitor for performance anyways. By
> showing object counts and data transfer size, we should be able to
> make some derived measurements to ensure operations are scaling the way
> we expect.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@xxxxxxxxxx>
> ---
> One note: by putting trace collection into the progress library, we end
> up with data events which have titles like "Receiving objects" - not
> very machine-parseable. An alternative might be to ask for a
> machine-readable title in the progress struct, but I didn't think it was
> worth the code churn. However, I don't have experience with processing
> the trace data after it's been collected, so if this is a bigger problem
> than I think, please say so and I'll figure something out.
> 
> CI run here, although it failed on the same error Junio noted today[1]:
> https://github.com/nasamuffin/git/runs/668457062
> 
>  - Emily
> 
> [1]: https://lore.kernel.org/git/xmqqtv0kc2q1.fsf@xxxxxxxxxxxxxxxxxxxxxx
> 
>  progress.c                  | 17 +++++++++++++++++
>  t/t0500-progress-display.sh | 26 ++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/progress.c b/progress.c
> index 75633e9c5e..6d2dcff0b6 100644
> --- a/progress.c
> +++ b/progress.c
> @@ -265,6 +265,7 @@ static struct progress *start_progress_delay(const char *title, uint64_t total,
>  	progress->title_len = utf8_strwidth(title);
>  	progress->split = 0;
>  	set_progress_signal();
> +	trace2_region_enter("progress", title, the_repository);
>  	return progress;
>  }
>  
> @@ -320,6 +321,22 @@ void stop_progress(struct progress **p_progress)
>  {
>  	finish_if_sparse(*p_progress);
>  
> +	if (p_progress && *p_progress) {
> +		trace2_data_intmax("progress", the_repository, "total_objects",
> +				   (*p_progress)->total);

Should this be "total_objects"? Progress can iterate over lots of things,
such as cache entries. Perhaps leave it as "total"?

> +
> +		if ((*p_progress)->throughput)
> +			trace2_data_intmax("progress", the_repository,
> +					   "total_bytes",
> +					   (*p_progress)->throughput->curr_total);

I like the extra detail here for the specific kind of progress used in
network transfer.

> +	}
> +
> +	trace2_region_leave("progress",
> +			    p_progress && *p_progress
> +				? (*p_progress)->title
> +				: NULL,
> +			    the_repository);
> +
>  	stop_progress_msg(p_progress, _("done"));
>  }

This trace2_region_leave() needs to be conditional on the progress
being non-null. The pattern used by consumers of the progress API is:

	if (my_progress_condition)
		start_progress(&progress);

	do {
		display_progress(&progress, count);
	} while (condition);

	stop_progress(&progress);

The condition to show progress or not is conditional on an option that
is external to the progress API.

The fix for this patch is simple: make the trace2_region_leave() be
conditional on "p_progress && *p_progress".

This leads to an extra problem: if a user uses an option such as "--quiet",
then the trace2 regions won't appear at all. This becomes even more important
when thinking about scripts or tools that have stderr as a non-TTY, which
disables progress most of the time.

It's best to have trace2 data be consistent across commands. I think this can
be accomplished, but it is a more invasive change to the rest of the codebase.
It requires invoking the progress API in all cases, and having the progress
API conditionally initialize the progress struct. The new pattern would look
like this:

	start_progress(&progress, my_progress_condition);

	do {
		display_progress(&progress, count);
	} while (condition);

	stop_progress(&progress);

Then, start_progress() (and variants) could always start the trace2 region,
and stop_progress() could always end the trace2 region.

>  
> diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
> index d2d088d9a0..1ed1df351c 100755
> --- a/t/t0500-progress-display.sh
> +++ b/t/t0500-progress-display.sh
> @@ -283,4 +283,30 @@ test_expect_success 'cover up after throughput shortens a lot' '
>  	test_i18ncmp expect out
>  '
>  
> +test_expect_success 'progress generates traces' '
> +	cat >in <<-\EOF &&
> +	throughput 102400 1000
> +	update
> +	progress 10
> +	throughput 204800 2000
> +	update
> +	progress 20
> +	throughput 307200 3000
> +	update
> +	progress 30
> +	throughput 409600 4000
> +	update
> +	progress 40
> +	EOF
> +
> +	GIT_TRACE2_EVENT="$(pwd)/trace.event" test-tool progress --total=40 \
> +		"Working hard" <in 2>stderr &&
> +
> +	# t0212/parse_events.perl intentionally omits regions and data.
> +	grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
> +	grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
> +	grep "\"key\":\"total_objects\",\"value\":\"40\"" trace.event &&
> +	grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event
> +'
> +

Thanks for the test! While this is sufficient to test the trace2 in the
happy path, I think that if you run a Git command such as `git commit-graph write`
that automatically quiets progress in the test suite (non-TTY stderr) you will
find the trace2 logs malformed due to excess end regions.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux