Re: [PATCH] pack-refs: teach --exclude option to exclude refs from being packed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"John Cai via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: John Cai <jcai@xxxxxxxxxx>
>
> Currently once refs are packed into a pack-refs file, deleting them can be
> slow since it involves a full file scan of the entire pack-refs file. At

"pack-refs" -> "packed-refs".

But performing a full file scan of 100MB file would take how many
milliseconds?  Having to remove many refs from a single packed-refs
file would be costly if it is done one-by-one, though.  I wonder how
effective our batched update mechanism is these days...

Sorry for straying to a tangent.  In any case, I do not think the
first sentence is necessary; just start it with "At GitLab, ..." to
say that you have a need to be more selective than "is it a tag or
not?" to choose refs to be and not to be packed.  The reason why we
may want to leave a ref loose is not limited to ref deletion
performance, and the benefit of this new feature is not limited to
it, either.

> GitLab, we have a system that creates ephemeral internal refs that don't
> live long before getting deleted. Having an option to not include certain
> refs from a pack-refs file allows these internal references to be deleted
> much more efficiently.

I think that is a valid use case.

If we step back a bit, we can see "pack-refs" has an option "--all"
that was added by b3d4204f (git-pack-refs --all, 2006-10-08), to
allow us pack only tags by default, because packing branches that
are meant to be updated often and also removed more often than tags
were found to be detrimental.  We can view this new option a
follow-up work for the commit, to allow the users to define what to
be and what not to be packed, depending on their workflow.

This observation also makes us realize that we should consider the
opposite.  It would benefit us to include some refs that we normally
do not pack and be more selective than "--all" ("--exclude" proposed
here is a way to leave some refs that we normally pack and be more
selective than not running pack-refs at all).  A set of branches
that are only occasionally used may want to be packed while the rest
of branches want to be left loose because they are more active, or
something.

> Add an --exclude option to the pack-refs builtin, and use the ref
> exclusions API to exclude certain refs from being packed into the final
> pack-refs file

"pack-refs" appears here, too.

>  Documentation/git-pack-refs.txt |  8 +++++++-
>  builtin/pack-refs.c             | 17 +++++++++++++++--
>  refs.c                          |  4 ++--
>  refs.h                          |  6 +++++-
>  refs/debug.c                    |  4 ++--
>  refs/files-backend.c            | 13 ++++++++++---
>  refs/packed-backend.c           |  3 ++-
>  refs/refs-internal.h            |  4 +++-
>  t/helper/test-ref-store.c       |  3 ++-
>  t/t3210-pack-refs.sh            | 18 ++++++++++++++++++
>  10 files changed, 66 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/git-pack-refs.txt b/Documentation/git-pack-refs.txt
> index 154081f2de2..d80e0a1562d 100644
> --- a/Documentation/git-pack-refs.txt
> +++ b/Documentation/git-pack-refs.txt
> @@ -8,7 +8,7 @@ git-pack-refs - Pack heads and tags for efficient repository access
>  SYNOPSIS
>  --------
>  [verse]
> -'git pack-refs' [--all] [--no-prune]
> +'git pack-refs' [--all] [--no-prune] [--exclude <pattern>]
>  
>  DESCRIPTION
>  -----------
> @@ -59,6 +59,12 @@ a repository with many branches of historical interests.
>  The command usually removes loose refs under `$GIT_DIR/refs`
>  hierarchy after packing them.  This option tells it not to.
>  
> +--exclude <pattern>::
> +
> +Do not pack refs matching the given `glob(7)` pattern. Repetitions of this option
> +accumulate exclusion patterns. Use `--no-exclude` to clear and reset the list of
> +patterns.

The interaction of this option with "--all" needs to be described
somewhere.  If we are to be adding "--include" for completeness,
that one also needs to interact with "--all".

> diff --git a/refs.c b/refs.c
> index d2a98e1c21f..637810347a0 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2132,9 +2132,9 @@ void base_ref_store_init(struct ref_store *refs, struct repository *repo,
>  }
>  
>  /* backend functions */
> -int refs_pack_refs(struct ref_store *refs, unsigned int flags)
> +int refs_pack_refs(struct ref_store *refs, unsigned int flags, struct pack_refs_opts *pack_opts)
>  {
> -	return refs->be->pack_refs(refs, flags);
> +	return refs->be->pack_refs(refs, flags, pack_opts);

That's a curious choice of the API.  It is not like backend methods
all share the same "flags" bitset (they share "refs" pointer to the
ref_store), so I would have expected that it would become part of
the pack_refs_opts structure.  Do not call the parameter pack_opts;
either spell it out as "pack_refs_opts", or just use "opts".  The
latter is probably more preferrable as I do not expect it to be
ambiguous with other kinds of "opts".

The rest of the pack I found nothing unexpected or surprising.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux