Re: [PATCH 2/2] refs/files: use heuristic to decide whether to repack with `--auto`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Patrick Steinhardt <ps@xxxxxx> writes:

> The `--auto` flag for git-pack-refs(1) allows the ref backend to decide
> whether or not a repack is in order. This switch has been introduced
> mostly with the "reftable" backend in mind, which already knows to
> auto-compact its tables during normal operations. When the flag is set,
> then it will use the same auto-compaction mechanism and thus end up
> doing nothing in most cases.
>
> The "files" backend does not have any such heuristic yet and instead

Nit: s/instead/will instead/

> packs any loose references unconditionally. So we rewrite the complete
> "packed-refs" file even if there's only a single loose reference to be
> packed.
>
> Even worse, starting with 9f6714ab3e (builtin/gc: pack refs when using
> `git maintenance run --auto`, 2024-03-25), `git pack-refs --auto` is
> unconditionally executed via our auto maintenance, so we end up repacking
> references every single time auto maintenance kicks in. And while that
> commit already mentioned that the "files" backend unconditionally packs
> refs now, the author obviously didn't quite think about the consequences
> thereof. So while the idea was sound, we really should have added a
> heuristic to the "files" backend before implementing it.
>
> Introduce a heuristic that decides whether or not it is worth to pack
> loose references. The important factors to decide here are the number of
> loose references in comparison to the overall size of the "packed-refs"
> file. The bigger the "packed-refs" file, the longer it takes to rewrite
> it and thus we scale up the limit of allowed loose references before we
> repack.
>
> As is the nature of heuristics, this mechansim isn't obviously
> "correct", but should rather be seen as a tradeoff between how much
> resources we spend packing refs and how inefficient the ref store
> becomes. For all I can say, we have successfully been using the exact
> same heuristic in Gitaly for several years by now.
>
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  refs/files-backend.c          | 75 +++++++++++++++++++++++++++++++
>  refs/packed-backend.c         | 18 ++++++++
>  refs/packed-backend.h         |  7 +++
>  t/t0601-reffiles-pack-refs.sh | 85 ++++++++++++++++++++++++++++++-----
>  4 files changed, 175 insertions(+), 10 deletions(-)
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 1cff65f6ae5..261e2b8570f 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -1311,6 +1311,78 @@ static int should_pack_ref(struct files_ref_store *refs,
>  	return 0;
>  }
>
> +static size_t fastlog2(size_t sz)

Nit: We already `reftable/stack_test.c:fastlog2` and `bisect.c:log2i`. I
wonder if it would be nice to consolidate all of them. But I guess it's
okay, since the reftable code anyways is isolated from the rest of the
Git code.

> +{
> +	size_t l = 0;
> +	if (!sz)
> +		return 0;
> +	for (; sz; sz /= 2)
> +		l++;
> +	return l - 1;
> +}

[snip]

The rest of the patch looks great!

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux