Re: [PATCH] pack-objects: convert to use parse_options()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:

> @@ -2305,183 +2300,140 @@ static void get_object_list(int ac, const char **av)
>  		loosen_unused_packed_objects(&revs);
>  }
>  
> +static int option_parse_index_version(const struct option *opt,
> +				      const char *arg, int unset)
> +{
> +	char *c;
> +	const char *val = arg;
> +	pack_idx_opts.version = strtoul(val, &c, 10);
> +	if (pack_idx_opts.version > 2)
> +		die("unsupported index version %s", val);

Dying may have been the appropriate error path in the custom option
parser, but is it still so for a callback in the parse-options framework?

Or should this be a 'return error(_("..."), val)'?

> +	if (*c == ',' && c[1])
> +		pack_idx_opts.off32_limit = strtoul(c+1, &c, 0);

I do not think the original had this extra "&& c[1]" check and I didn't
see any explanation of this change in the log message, either.

> +static int option_parse_ulong(const struct option *opt,
> +			      const char *arg, int unset)
> +{
> +	if (unset)
> +		die("option %s does not accept negative form",
> +		    opt->long_name);
> +
> +	if (!git_parse_ulong(arg, opt->value))
> +		die("unable to parse value '%s' for option %s",
> +		    arg, opt->long_name);
> +	return 0;
> +}

Likewise on "die()".

> +#define OPT_ULONG(s, l, v, h) \
> +	{ OPTION_CALLBACK, (s), (l), (v), "n", (h),	\
> +	  PARSE_OPT_NONEG, option_parse_ulong }
> +
>  int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  {
>  	int use_internal_rev_list = 0;
>  	int thin = 0;
>  	int all_progress_implied = 0;
> -	uint32_t i;
> -	const char **rp_av;
> -	int rp_ac_alloc = 64;
> -	int rp_ac;
> +	const char *rp_av[6];

Nice stackframe shrinkage.

> +	int rp_ac = 0;
> +	int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
> +	struct option pack_objects_options[] = {
> +		OPT_SET_INT('q', "quiet", &progress,
> +			    "do not show progress meter", 0),
> +		OPT_SET_INT(0, "progress", &progress,
> +			    "show progress meter", 1),
> +		OPT_SET_INT(0, "all-progress", &progress,
> +			    "show progress meter during object writing phase", 2),

Sounds as if the progress is not shown during the counting phase.
Nothing ", too" at the end could not fix, though.

> +		OPT_BOOL(0, "all-progress-implied",
> +			 &all_progress_implied,
> +			 "similar to --all-progress when progress meter is shown"),

Hrm, interesting wording.

> +		{ OPTION_CALLBACK, 0, "index-version", NULL, "version",
> +		  "force generating pack index at a particular version",
> +		  0, option_parse_index_version },

Sounds as if you can generate a pack index at HEAD~24, but that is not
what you meant. "version" is too loaded a word that needs qualification.

Perhaps "write the pack index file in the specified idx format version".

"index" is also a loaded word, and our documentation tries to use "idx"
when we talk about the pack index (we say "index" when we mean the
dircache).

> +		OPT_INTEGER(0, "window", &window,
> +			    "limit pack window by objects"),
> +		OPT_ULONG(0, "window-memory", &window_memory_limit,
> +			  "limit pack window by memory"),
> +		OPT_INTEGER(0, "depth", &depth,
> +			    "limit pack window by maximum delta depth"),

These descriptions are somewhat interesting.

The "pack window" is limited by number of objects given by --window (and
by default this is set to 10), but if you give --window-memory, it could
further dynamically shrink it under memory pressure. If the logic were to
use this dynamic shrinkage without any limit on the number of objects in
flight when --window-memory and no --window is given, then you could say
"limit by memory" as if the "pack window" is counted in terms of number of
bytes in the window instead of number of objects and the description would
be accurate. But that is not quite the case as far as I know.

And --depth does not affect the size of the pack window at all. It rejects
an object with deep enough delta chain as a delta-base candidate to limit
the run-time performance penalty for the user of the resulting pack, but
does not stop the caller from trying the next object in the window.
"maximum length of delta chain allowed in the resulting pack", perhaps?

> +		OPT_BOOL(0, "reuse-delta", &reuse_delta,
> +			 "reusing existing deltas"),
> +		OPT_BOOL(0, "reuse-object", &reuse_object,
> +			 "reusing existing objects"),

s/reusing/reuse/???

> +		OPT_BOOL(0, "delta-base-offset", &allow_ofs_delta,
> +			 "use OFS_DELTA objects"),
> +		OPT_INTEGER(0, "threads", &delta_search_threads,
> +			    "use threads when searching for best delta matches"),
> +		OPT_BOOL(0, "non-empty", &non_empty,
> +			 "only create if it would contain at least one object"),

create what? "do not create an empty pack output"?

> +		OPT_BOOL(0, "revs", &use_internal_rev_list,
> +			 "read revision arguments from standard output"),

s/output/input/???

> +		{ OPTION_SET_INT, 0, "unpacked", &rev_list_unpacked, NULL,
> +		  "limit the objects to those that are not already packed",
> +		  PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, 1 },

s/already/yet/???

> +		{ OPTION_SET_INT, 0, "all", &rev_list_all, NULL,
> +		  "include all refs under $GIT_DIR/refs directory",
> +		  PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, 1 },
> +		{ OPTION_SET_INT, 0, "reflog", &rev_list_reflog, NULL,
> +		  "include objects referred by reflog entries",
> +		  PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, 1 },

The latter is more correct---packfile does not pack refs, it packs
objects.

Perhaps former should match: "include objects reachable from any
reference". I would also prefer to avoid "$GIT_DIR/refs _directory_"
to discourage people from running "ls -R .git/refs/".

> +		OPT_BOOL(0, "stdout", &pack_to_stdout,
> +			 "output pack to stdout"),
> +		OPT_BOOL(0, "include-tag", &include_tag,
> +			 "include unasked-for annotated tags"),

"include tag objects that refer to objects to be packed"?

> +		OPT_BOOL(0, "keep-unreachable", &keep_unreachable,
> +			 "keep unreachable objects"),

What does this option mean these days (aka "where are they kept")?

IIRC, this was meant to be used in conjunction with the --unpacked= and
later with --kept-pack-only (see 03a9683d22) and then was later further
modified by 4d6acb70411.

I *think* this meant to include objects that are in packfiles that are not
marked with .keep even when they are not in the revision range to be
packed, so that we won't lose them during gc.

But I think --unpack-unreachable introduced by ca11b21 (let pack-objects
do the writing of unreachable objects as loose objects, 2008-05-14) that
writes these unreachable objects out of the pack to loose form made this
option unnecessary, and it remains unused ever since. It may not be a bad
idea to deprecate this option.

Obviously it is outside of the scope of this patch.

> +		OPT_BOOL(0, "unpack-unreachable", &unpack_unreachable,
> +			 "unpack unreachable objects"),
> +		OPT_BOOL(0, "thin", &thin,
> +			 "create thin packs"),
> +		OPT_BOOL(0, "honor-pack-keep", &ignore_packed_keep,
> +			 "ignore packs that have companion .keep file"),
> +		OPT_INTEGER(0, "compression", &pack_compression_level,
> +			    "pack compression level"),
> +		OPT_SET_INT(0, "keep-true-parents", &grafts_replace_parents,
> +			    "do not hide commits by grafts", 0),

It is noteworthy that this is truly "do not hide".

A graft entry can also add extra commits, but this option will not prevent
the command from including them, so that the result of repacking will keep
the union of parents in the real history (i.e. in commit objects) and
parents in the pretend history (i.e. added by grafts).

> +		OPT_END(),
> +	};
>  
>  	read_replace_refs = 0;
> ...
> -			continue;
> -		}
> -		usage(pack_usage);
> +	if (rev_list_unpacked) {
> +		use_internal_rev_list = 1;
> +		rp_av[rp_ac++] = "--unpacked";
> +	}
> +	if (rev_list_all) {
> +		use_internal_rev_list = 1;
> +		rp_av[rp_ac++] = "--all";
> +	}
> +	if (rev_list_reflog) {
> +		use_internal_rev_list = 1;
> +		rp_av[rp_ac++] = "--reflog";
>  	}

I'd prefer to have "unpacked" the last, to match the order of parameters
in which the typical caller, i.e. repack, calls us, but that is minor.

> @@ -2497,12 +2449,25 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	 * walker.
>  	 */
>  
> -	if (!pack_to_stdout)
> -		base_name = argv[i++];
> -
> -	if (pack_to_stdout != !base_name)
> -		usage(pack_usage);
> +	if (!pack_to_stdout) {
> +		if (!argc)
> +			die("base name required if --stdout is not given");
> +		base_name = argv[0];
> +		argc--;
> +	}
> +	if (argc)
> +		die("base name or --stdout are mutually exclusive");

Doesn't this change regress the most basic, i.e.

	$ git pack-objects
        usage: ...

Other than that, thanks for a pleasant and thought-provoking read.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]