Re: [PATCH] repack: Add options to preserve and prune old pack files

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 07 Mar 2017 12:33:42 -0800

James Melvin <jmelvin@xxxxxxxxxxxxxx> writes:

> These options are designed to prevent stale file handle exceptions
> during git operations which can happen on users of NFS repos when
> repacking is done on them. The strategy is to preserve old pack files
> around until the next repack with the hopes that they will become
> unreferenced by then and not cause any exceptions to running processes
> when they are finally deleted (pruned).

I find it a very sensible strategy to work around NFS, but it does
not explain why the directory the old ones are moved to need to be
configurable.  It feels to me that a boolean that causes the old
ones renamed s/^pack-/^old-&/ in the same directory (instead of
pruning them right away) would risk less chances of mistakes (e.g.
making "preserved" subdirectory on a separate device mounted there
in a hope to reduce disk usage of the primary repository, which
may defeat the whole point of moving the still-active file around
instead of removing them).

And if we make "preserve-old" a boolean, perhaps the presence of
"prune-preserved" would serve as a substitute for it, iow, perhaps
we may only need --prune-preserved option (and repack.prunePreserved
configuration variable)?

> diff --git a/builtin/repack.c b/builtin/repack.c
> index 677bc7c81..f1a0c97f3 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -10,8 +10,10 @@
>  
>  static int delta_base_offset = 1;
>  static int pack_kept_objects = -1;
> +static int preserve_oldpacks = 0;
> +static int prune_preserved = 0;

We avoid initializing statics to 0 or NULL and instead let BSS take
care of them...

>  static int write_bitmaps;
> -static char *packdir, *packtmp;
> +static char *packdir, *packtmp, *preservedir;

... just like what you did here.

> @@ -108,6 +110,27 @@ static void get_non_kept_pack_filenames(struct s
> ...
> +static void preserve_pack(const char *file_path, const char *file_name,  const char *file_ext)
> +{
> +	char *fname_old;
> +
> +	if (mkdir(preservedir, 0700) && errno != EEXIST)
> +		error(_("failed to create preserve directory"));

You do not want to do the rest of this function after issuing this
error, no?  Because ...

> +
> +	fname_old = mkpathdup("%s/%s.old-%s", preservedir, file_name, ++file_ext);
> +	rename(file_path, fname_old);

... this rename(2) would fail, whose error return you would catch
and act on.

> +	free(fname_old);
> +}
> +
> +static void remove_preserved_dir(void) {
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_addstr(&buf, preservedir);
> +	remove_dir_recursively(&buf, 0);

This is a wrong helper function to use on files and directories
inside .git/; the function is about removing paths in the working
tree.

> @@ -121,7 +144,10 @@ static void remove_redundant_pack(const char *dir_name, const char *base_name)
>  	for (i = 0; i < ARRAY_SIZE(exts); i++) {
>  		strbuf_setlen(&buf, plen);
>  		strbuf_addstr(&buf, exts[i]);
> -		unlink(buf.buf);
> +		if (preserve_oldpacks)
> +			preserve_pack(buf.buf, base_name, exts[i]);
> +		else
> +			unlink(buf.buf);

OK.

> @@ -194,6 +220,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  				N_("maximum size of each packfile")),
>  		OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
>  				N_("repack objects in packs marked with .keep")),
> +		OPT_BOOL(0, "preserve-oldpacks", &preserve_oldpacks,
> +				N_("move old pack files into the preserved subdirectory")),
> +		OPT_BOOL(0, "prune-preserved", &prune_preserved,
> +				N_("prune old pack files from the preserved subdirectory after repacking")),
>  		OPT_END()
>  	};
>  
> @@ -217,6 +247,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  
>  	packdir = mkpathdup("%s/pack", get_object_directory());
>  	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
> +	preservedir = mkpathdup("%s/preserved", packdir);
>  
>  	sigchain_push_common(remove_pack_on_signal);
>  
> @@ -404,6 +435,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  
>  	/* End of pack replacement. */
>  
> +	if (prune_preserved)
> +		remove_preserved_dir();

I am not sure if I understand your design.  Your model looks to me
like there are two modes of operation.  #1 uses "--preserve-old" and
sends old ones to purgatory instead of removing them and #2 uses
"--prune-preserved" to remove all the ones in the purgatory
immediately.  A few things that come to my mind immediately:

 * When "--prune-preseved" is used, it removes both ancient ones and
   more recent ones indiscriminately.  Would it make more sense to
   "expire" only the older ones while keeping the more recent ones?

 * It appears that the main reason you would want --prune-preserved
   in this design is after running with "--preserve-old" number of
   times, you want to remove really old ones that have accumulated,
   and I would imagine that at that point of time, you are only
   interested in repack, but the code structure tells me that this
   will force the users to first run a repack before pruning.

I suspect that a design that is simpler to explain to the users may
be to add a command line option "--preserve-pruned=<expiration>" and
a matching configuration variable repack.preservePruned, which
defaults to "immediate" (i.e. no preserving), and

 - When the value of preserve_pruned is not "immediate", use
   preserve_pack() instead of unlink();

 - At the end, find preserved packs that are older than the value in
   preserve_pruned and unlink() them.

It also may make sense to add another command line option
"--prune-preserved-packs-only" (without matching configuration
variable) that _ONLY_ does the "find older preserved packs and
unlink them" part, without doing any repack.