On 4/2/2019 8:00 PM, Jonathan Nieder wrote:
Hi,
Jeff Hostetler via GitGitGadget wrote:
Teach git to read the system config (usually "/etc/gitconfig") for
default Trace2 settings. This allows system-wide Trace2 settings to
be installed and inherited to make it easier to manage a collection of
systems.
Yay! Thanks for writing this, and sorry for the slow review.
[...]
Only the system config file is used. Trace2 config values are ignored
in local, global, and other config files. Likewise, the "-c" command
line arguments are ignored for Trace2 values. These limits are for
performance reasons.
Can you say a bit more about this? If I'm willing to pay the
performance cost, is there a way for me to turn on respecting other
config files? What is that performance cost?
Several thoughts here. Some are performance-related and some are
startup chicken-n-egg problems. I tried to add trace2 to the code
base in the least disruptive way possible and hopefully with minimal
side-effects on existing behaviors. And also minimum impact on overall
performance when not in use.
So, with that in mind:
[] Trace2 should be initialized as early as and lightly as possible
so that timers are started and it can capture events from other
startup activities. And so that we can tell as early as possible
if trace2 should NOT be enabled and short-cut those calls and not
waste time collecting unnecessary data.
[] Trace2 is initialized by calls from common-main.c:main(). This
happens before cmd_main() is called, so we are very early in the
startup process and very few things have been initialized. For
example, I had to let git_resolve_executable_dir() run so that
we could find the location of the system config, but that's about
it. We have not even initialized "the_repository" yet.
[] WRT "-c var=val", this is processed by git.c:handle_options() and
called from git.c:cmd_main() and git.c:handle_alias(). So the "-c"
args won't be respected until after that point.
Adding trace2 initialization after that point is a bit of a mess.
Teaching trace2 to scan the argv for "-c" is just duplicating effort.
And moving the "-c" processing earlier in the startup, changes the
behavior of the "git-*" commands (which don't currently recognize
the "-c" option).
So that's why I'm suggesting we not respect "-c" for this.
[] By initializing trace2 inside main() we guarantee that it will get
started. If we wait to initialize it until after handle_options()
returns, we miss events for commands that it handles itself (such
as "git --exec-path" where it just prints and exits or syntax errors
where it directly calls usage() and exits).
[] WRT to per-repo config values:
In common-main.c:main() we have not yet discovered the .git dir,
so repo-local config files are questionable at a very early point
in the process startup. Again, it comes down to whether we wait
for gitdir discovery (and whatever file system scanning, --gitdir,
or -C processing is required) before deciding whether to start
trace2 or not.
Also, per-repo trace2 config settings aren't available at the time
of the clone, so just relying on them will leave our telemetry
incomplete. It's better to have a system (or maybe a user) setting
and not a per-repo setting.
Longer term, with submodules and plans to support them in-proc rather
than via sub-processes, I don't think there should be any expectation
that trace2 settings would adapt during recursive operations. So for
example, a top-level command might see different trace2 settings than
a command run inside a submodule. If submodule operations become
in-proc, then to maintain that expected behavior we'd need to have
per-repo trace2 settings. (Granted, not impossible, but by saying no
to that now, we can eliminate a possible pain point in the future.)
[] i'll add more on this topic at the bottom of this note in response to
your per-user and includes questions.
[...]
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -117,6 +117,37 @@ values are recognized.
Socket type can be either `stream` or `dgram`. If the socket type is
omitted, Git will try both.
+== Trace2 Settings in System Config
+
+Trace2 also reads configuration information from the system config.
+This is intended to help adminstrators to gather system-wide Git
+performance data.
+
+Trace2 only reads the system configuration, it does not read global,
+local, worktree, or `-c` config settings.
An additional limitation is that this doesn't appear to support
include.* directives. Intended?
I didn't specifically intend to support or not-support include files
here. Frankly, the complexity of the config code makes my eyes bleed.
I'll investigate adding it.
[...]
--- a/t/t0210-trace2-normal.sh
+++ b/t/t0210-trace2-normal.sh
[...]
+MOCK=./mock_system_config
+
+test_expect_success 'setup mocked /etc/gitconfig' '
+ git config --file $MOCK trace2.normalTarget "$(pwd)/trace.normal" &&
+ git config --file $MOCK trace2.normalBrief 1
+'
+
+test_expect_success 'using mock, normal stream, return code 0' '
+ test_when_finished "rm trace.normal actual expect" &&
+ GIT_TEST_TR2_SYSTEM_CONFIG="$MOCK" test-tool trace2 001return 0 &&
Tests run with GIT_CONFIG_NOSYSTEM=1 to protect themselves from any
system config the user has.
Thanks for the pointer. I'll update the tr2_sysenv_load() to check
git_system_config() before trying to use the system config file.
So this would be easier to test if we can use user-level config for
it.
I suppose. I think it just moves the testing problem to a trick using
$HOME, right? But it does get rid of that GIT_TEST_TR2_SYSTEM_CONFIG
symbol, so yeah a net win. I'll investigate. Thanks.
[...]
--- /dev/null
+++ b/trace2/tr2_sysenv.c
@@ -0,0 +1,128 @@
+#include "cache.h"
+#include "config.h"
+#include "dir.h"
+#include "tr2_sysenv.h"
+
+/*
+ * Each entry represents a trace2 setting.
+ * See Documentation/technical/api-trace2.txt
+ */
+struct tr2_sysenv_entry {
+ const char *env_var_name;
+ const char *git_config_name;
+
+ char *value;
+ unsigned int getenv_called : 1;
+};
+
+/*
+ * This table must match "enum tr2_sysenv_variable" in tr2_sysenv.h.
Can we deduplicate to avoid the need to match?
Perhaps using C99 array initializers would help:
[TR2_SYSENV_CFG_PARAM] = { ... },
Cool. I hadn't seen that syntax before.
[...]
+/*
+ * Load Trace2 settings from the system config (usually "/etc/gitconfig"
+ * unless we were built with a runtime-prefix). These are intended to
+ * define the default values for Trace2 as requested by the administrator.
+ */
+void tr2_sysenv_load(void)
+{
+ const char *system_config_pathname;
+ const char *test_pathname;
+
+ system_config_pathname = git_etc_gitconfig();
+
+ test_pathname = getenv("GIT_TEST_TR2_SYSTEM_CONFIG");
+ if (test_pathname) {
+ if (!*test_pathname || !strcmp(test_pathname, "0"))
+ return; /* disable use of system config */
+
+ /* mock it with given test file */
+ system_config_pathname = test_pathname;
+ }
+
+ if (file_exists(system_config_pathname))
+ git_config_from_file(tr2_sysenv_cb, system_config_pathname,
+ NULL);
This duplicates functionality from config.c and misses some features
along the way (e.g. support for include.*).
Would read_early_config work? If we want a variant that doesn't
discover_git_directory, we can change that function to handle it.
For our needs at Google, it would be very helpful to have support for
include.* and reading settings from at least $HOME/.gitconfig in
addition to /etc/gitconfig (even better if it supports per-repo config
as well). I believe it would simplify the code and tests, too. If
there's anything I can to help, please don't hesitate to ask.
read_early_config() might work, but it is pretty heavy and doesn't
address my above comments.
Perhaps a compromise would be to have it read just the system and
user config files (w/ includes turned on). It shouldn't take too
much time and it shouldn't have the startup-order dependencies, I
think.
Thoughts??
Thanks,
Jonathan
Thanks,
Jeff