Re: [PATCH v2 3/8] repack: add --name-hash-version option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

[snip]

>  DESCRIPTION
>  -----------
> @@ -249,6 +251,36 @@ linkgit:git-multi-pack-index[1]).
>  	Write a multi-pack index (see linkgit:git-multi-pack-index[1])
>  	containing the non-redundant packs.
>
> +--name-hash-version=<n>::
> +	While performing delta compression, Git groups objects that may be
> +	similar based on heuristics using the path to that object. While
> +	grouping objects by an exact path match is good for paths with
> +	many versions, there are benefits for finding delta pairs across
> +	different full paths. Git collects objects by type and then by a
> +	"name hash" of the path and then by size, hoping to group objects
> +	that will compress well together.
> ++
> +The default name hash version is `1`, which prioritizes hash locality by
> +considering the final bytes of the path as providing the maximum magnitude
> +to the hash function. This version excels at distinguishing short paths
> +and finding renames across directories. However, the hash function depends
> +primarily on the final 16 bytes of the path. If there are many paths in
> +the repo that have the same final 16 bytes and differ only by parent
> +directory, then this name-hash may lead to too many collisions and cause
> +poor results. At the moment, this version is required when writing
> +reachability bitmap files with `--write-bitmap-index`.
> ++
> +The name hash version `2` has similar locality features as version `1`,
> +except it considers each path component separately and overlays the hashes
> +with a shift. This still prioritizes the final bytes of the path, but also
> +"salts" the lower bits of the hash using the parent directory names. This
> +method allows for some of the locality benefits of version `1` while
> +breaking most of the collisions from a similarly-named file appearing in
> +many different directories. At the moment, this version is not allowed
> +when writing reachability bitmap files with `--write-bitmap-index` and it
> +will be automatically changed to version `1`.
> +
> +

Nit: I wonder if it'd be nicer to simply point to the documentation in
'Documentation/git-pack-objects.txt'. This would ensure we have
consistent documentation and a single source of truth.
>
>  CONFIGURATION
>  -------------
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 05e13adb87f..5e7ff919c1a 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -39,7 +39,9 @@ static int run_update_server_info = 1;
>  static char *packdir, *packtmp_name, *packtmp;
>
>  static const char *const git_repack_usage[] = {
> -	N_("git repack [<options>]"),
> +	N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
> +	   "[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]\n"
> +	   "[--write-midx] [--name-hash-version=<n>]"),
>  	NULL
>  };

So this fixes the mismatch in t0450 which is seen below.

Nit: might be worthwhile adding this in the commit message.

[snip]

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux