Re: [PATCH 01/11] maintenance: create basic maintenance runner

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/12/2020 5:03 PM, Jonathan Nieder wrote:
> Derrick Stolee wrote:
> 
>> Helped-by: Jonathan Nieder <jrnieder@xxxxxxxxx>
>> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>> ---
>>  .gitignore                        |  1 +
>>  Documentation/git-maintenance.txt | 57 +++++++++++++++++++++++++++++++
>>  builtin.h                         |  1 +
>>  builtin/gc.c                      | 57 +++++++++++++++++++++++++++++++
>>  git.c                             |  1 +
>>  t/t7900-maintenance.sh            | 19 +++++++++++
>>  t/test-lib-functions.sh           | 33 ++++++++++++++++++
>>  7 files changed, 169 insertions(+)
>>  create mode 100644 Documentation/git-maintenance.txt
>>  create mode 100755 t/t7900-maintenance.sh
> 
> Looks reasonable.
> 
> [...]
>> --- /dev/null
>> +++ b/Documentation/git-maintenance.txt
>> @@ -0,0 +1,57 @@
>> +git-maintenance(1)
>> +==================
>> +
>> +NAME
>> +----
>> +git-maintenance - Run tasks to optimize Git repository data
>> +
>> +
>> +SYNOPSIS
>> +--------
>> +[verse]
>> +'git maintenance' run [<options>]
>> +
>> +
>> +DESCRIPTION
>> +-----------
>> +Run tasks to optimize Git repository data, speeding up other Git commands
>> +and reducing storage requirements for the repository.
>> ++
>> +Git commands that add repository data, such as `git add` or `git fetch`,
>> +are optimized for a responsive user experience. These commands do not take
>> +time to optimize the Git data, since such optimizations scale with the full
>> +size of the repository while these user commands each perform a relatively
>> +small action.
>> ++
>> +The `git maintenance` command provides flexibility for how to optimize the
>> +Git repository.
>> +
>> +SUBCOMMANDS
>> +-----------
>> +
>> +run::
>> +	Run one or more maintenance tasks.
> 
> This is still confusing --- shouldn't it say something like "Run the
> `gc` maintenance task (see below)"?

As I mentioned in the earlier version, I disagree with this.

> [...]
>> +TASKS
>> +-----
>> +
>> +gc::
>> +	Cleanup unnecessary files and optimize the local repository. "GC"
> 
> nit: cleanup is the noun, "clean up" is the verb
> 
>> +	stands for "garbage collection," but this task performs many
>> +	smaller tasks. This task can be rather expensive for large
> 
> nit: the word "rather" is not doing much work here, so we could leave
> it out

Both good.

> 
>> +	repositories, as it repacks all Git objects into a single pack-file.
>> +	It can also be disruptive in some situations, as it deletes stale
>> +	data.
> 
> What stale data is this referring to?  Where can I read more about
> what disruption it causes (or in other words, as a user, how would I
> decide when *not* to run this command)?

For this and the next comment, I prefer (for now) to keep all of the
deep details of the 'gc' task in the git-gc.txt documentation. However,
I should use "linkgit:git-gc[1]" to point users to that for reference.

Eventually, the maintenance builtin might replace the gc builtin, but
only after all of its behavior has been extracted into smaller tasks.
Those tasks would be documented in detail here, allowing us to make the
blurb for the 'gc' task be "Run the 'prune-reflog', 'pack-refs', ...,
and 'full-repack' tasks."

> [...]
>> +OPTIONS
>> +-------
>> +--auto::
>> +	When combined with the `run` subcommand, run maintenance tasks
>> +	only if certain thresholds are met. For example, the `gc` task
>> +	runs when the number of loose objects exceeds the number stored
>> +	in the `gc.auto` config setting, or when the number of pack-files
>> +	exceeds the `gc.autoPackLimit` config setting.
> 
> Hm, today I learned.  I had thought that --auto doesn't only affect
> thresholds but also would affect how aggressive the gc is when
> triggered, but the git-gc(1) manpage agrees with what's said above.
> 
> Is there a way we can share information between the two to avoid one
> falling out of date?  For example, should git-gc.txt point to this
> page for the authoritative description?

I remember Junio mentioning something about how 'gc' interprets '--auto'
as a "quick" mode, but like you I cannot find any reference to that in
the docs. Since it is not documented there and none of the other tasks
I am implementing here use that interpretation, I'll plan to leave this
portion as-is.
 
> [...]
>> --- a/builtin/gc.c
>> +++ b/builtin/gc.c
>> @@ -699,3 +699,60 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>>  
>>  	return 0;
>>  }
>> +
>> +static const char * const builtin_maintenance_usage[] = {
>> +	N_("git maintenance run [<options>]"),
> 
> not about this patch: I wish we could automatically generate a usage
> string in this simple kind of case, to decrease the burden on
> translators.
> 
> [...]
>> +static int maintenance_task_gc(struct maintenance_opts *opts)
>> +{
>> +	struct child_process child = CHILD_PROCESS_INIT;
>> +
>> +	child.git_cmd = 1;
>> +	strvec_push(&child.args, "gc");
>> +
>> +	if (opts->auto_flag)
>> +		strvec_push(&child.args, "--auto");
>> +
>> +	close_object_store(the_repository->objects);
> 
> Interesting --- what does this function call do?

We need to close our file handles on the pack-files (and
commit-graph and multi-pack-index) or else the GC subcommand
cannot delete the files on Windows.

>> +	return run_command(&child);
>> +}
>> +
>> +static int maintenance_run(struct maintenance_opts *opts)
>> +{
>> +	return maintenance_task_gc(opts);
>> +}
>> +
>> +int cmd_maintenance(int argc, const char **argv, const char *prefix)
>> +{
>> +	static struct maintenance_opts opts;
>> +	static struct option builtin_maintenance_options[] = {
>> +		OPT_BOOL(0, "auto", &opts.auto_flag,
>> +			 N_("run tasks based on the state of the repository")),
>> +		OPT_END()
>> +	};
> 
> optional: these could be local instead of static

I see my problem here. The builtin_maintenance_options is static
(copied from another builtin, I'm sure) which is why the 'opts'
struct needs to be static or else this doesn't compile.

>> +
>> +	memset(&opts, 0, sizeof(opts));
> 
> not needed if it's static.  If it's not static, could use `opts = {0}`
> where it's declared to do the same thing in a simpler way.
> 
>> +
>> +	if (argc == 2 && !strcmp(argv[1], "-h"))
>> +		usage_with_options(builtin_maintenance_usage,
>> +				   builtin_maintenance_options);
>> +
>> +	argc = parse_options(argc, argv, prefix,
>> +			     builtin_maintenance_options,
>> +			     builtin_maintenance_usage,
>> +			     PARSE_OPT_KEEP_UNKNOWN);
>> +
>> +	if (argc == 1) {
>> +		if (!strcmp(argv[0], "run"))
>> +			return maintenance_run(&opts);
>> +	}
>> +
>> +	usage_with_options(builtin_maintenance_usage,
>> +			   builtin_maintenance_options);
> 
> optional: could reverse this test to get the uninteresting case out of
> the way first:
> 
> 	if (argc != 1)
> 		usage ...
> 
> 	...
> 
> That would also allow making the usage string when argv[0] is not
> "run" more specific.

Sure. Were you thinking of anything more specific than

	die(_("invalid subcommand: %s"), argv[0]);

? Of course, running "git maintenance barf" would result in

	fatal: invalid subcommand: barf

with an exit code of 128 instead of 129 like the usage string does,
so I'm not sure of the best way to make a custom "usage" error.

> [...]
>> --- /dev/null
>> +++ b/t/t7900-maintenance.sh
>> @@ -0,0 +1,19 @@
>> +#!/bin/sh
>> +
>> +test_description='git maintenance builtin'
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'help text' '
>> +	test_expect_code 129 git maintenance -h 2>err &&
>> +	test_i18ngrep "usage: git maintenance run" err
>> +'
>> +
>> +test_expect_success 'run [--auto]' '
>> +	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
>> +	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
>> +	test_subcommand git gc <run-no-auto.txt &&
>> +	test_subcommand git gc --auto <run-auto.txt
> 
> Nice.
> 
> [...]
>> --- a/t/test-lib-functions.sh
>> +++ b/t/test-lib-functions.sh
>> @@ -1561,3 +1561,36 @@ test_path_is_hidden () {
>>  	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
>>  	return 1
>>  }
>> +
>> +# Check that the given command was invoked as part of the
>> +# trace2-format trace on stdin.
>> +#
>> +#	test_subcommand [!] <command> <args>... < <trace>
>> +#
>> +# For example, to look for an invocation of "git upload-pack
>> +# /path/to/repo"
>> +#
>> +#	GIT_TRACE2_EVENT=event.log git fetch ... &&
>> +#	test_subcommand git upload-pack "$PATH" <event.log
>> +#
>> +# If the first parameter passed is !, this instead checks that
>> +# the given command was not called.
>> +#
>> +test_subcommand () {
>> +	local negate=
>> +	if test "$1" = "!"
>> +	then
>> +		negate=t
>> +		shift
>> +	fi
>> +
>> +	local expr=$(printf '"%s",' "$@")
>> +	expr="${expr%,}"
>> +
>> +	if test -n "$negate"
>> +	then
>> +		! grep "\[$expr\]"
>> +	else
>> +		grep "\[$expr\]"
>> +	fi
>> +}
> 
> With whatever subset of the changes described above makes sense,
> 
> Reviewed-by: Jonathan Nieder <jrnieder@xxxxxxxxx>
> 
> Thanks for your patient work.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux