Re: [PATCH 0/2] Set GIT_TRACE2_EVENT_NESTING in test-lib.sh

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/30/21 10:34 AM, Derrick Stolee wrote:
On 11/29/2021 1:40 PM, Jeff King wrote:
On Mon, Nov 29, 2021 at 01:47:44PM +0000, Derrick Stolee via GitGitGadget wrote:

As reported by Ævar [1] and diagnosed by Peff in a reply, the default
GIT_TRACE2_EVENT_NESTING is set so low that tests that look for trace2
events can start failing due to an added trace2 region. This can even be
subtle, including the fact that the progress API adds trace2 regions when in
use (which can depend on the verbose output of a test script).

I think this is a good change for fixing the immediate problem of the
test suite failing.

But I have to wonder if this is masking a problem that real users will
see. Aren't we silently discarding trace2 output that could be useful to
somebody debugging or trying to get performance metrics?

I.e., should we be bumping the internal nesting max of 2 to something
higher? If that would that cause problems with existing metrics, should
we be designing new metrics to avoid nesting?

One thing this makes me think about is that we might want to
have items such as trace2_data_intmax() and trace2_data_string()
not be subject to the nesting limit. Only have the nesting apply
to the regions (which nest).

CC'ing Jeff Hostetler for his thoughts on that idea. We will have
a discussion offline to see if the progress regions have started
to cause inconsistencies in our existing telemetry stream of
internal users.

Thanks,
-Stolee



yeah, I set a fairly conservative nesting limit.  my fear was that
someone would add a region or data event to something that recurses
deeply -- like treewalking or directory scanning -- which in a monorepo
would be bad since we might get millions of rows of data.

most of the useful timing data comes from the higher level regions,
such as we spent x% scanning tracked files and y% scanning untracked
files.  once we get down inside a particular "concept", the directory-
by-directory numbers aren't usually that useful (and/or the overhead of
logging every directory we visit dominates).

WRT allowing data events to ignore the nesting limit, we have the
same monorepo risk if for example someone logs how many files were
in each directory visited, the name of every untracked file, or every
file that had some property of interest.  That might look fine on a
shallow WT, but might generate an unusable amount of data on a large
repo.  so i'd recommend we proceed carefully here.  (i'm not against
it, i'm just saying we should think about it.)


using trace2 in a test to confirm that a piece of code was visited
or not visited is cool, but something i don't think i anticipated.
so i agree that we should increase the limit in the test suite.


and yes, the addition of trace2 calls inside the progress code
complicates the nesting.  I'm wondering if the progress code should
always emit the trace2 events, regardless of whether the progress msg
itself appears on the user's console.  That way we have consistent
logging -- the whole point of adding the region in the progress
code was to bracket the time spent doing the thing that needed
progress messaging after all.  the fact that you don't have a console
doesn't change the fact that there's an expensive computation there.

(i'd also consider changing the "total_objects" data event to include
include the region title so that they are more easily identified in a
grep, but I digress.)


Thanks,
Jeff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux