Re: [PATCH 6/9] repack: add `--filter=<filter-spec>` option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 21, 2023 at 1:17 PM Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> On Wed, Jun 14, 2023 at 09:25:38PM +0200, Christian Couder wrote:
> > ---
> >  Documentation/git-repack.txt |  5 +++
> >  builtin/repack.c             | 75 ++++++++++++++++++++++++++++++++++--
> >  t/t7700-repack.sh            | 16 ++++++++
> >  3 files changed, 93 insertions(+), 3 deletions(-)
>
> Having read through the implementation in the repack builtin, I am
> almost certain that my suggestion earlier in the thread to implement
> this in terms of 'git pack-objects --filter' and 'git pack-objects
> --stdin-packs' would work.

Yeah, it works, thanks. That's what is used in the version 2 I just sent.

> > diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
> > index 4017157949..aa29c7e648 100644
> > --- a/Documentation/git-repack.txt
> > +++ b/Documentation/git-repack.txt
> > @@ -143,6 +143,11 @@ depth is 4095.
> >       a larger and slower repository; see the discussion in
> >       `pack.packSizeLimit`.
> >
> > +--filter=<filter-spec>::
> > +     Remove objects matching the filter specification from the
> > +     resulting packfile and put them into a separate packfile. See
> > +     linkgit:git-rev-list[1] for valid `<filter-spec>` forms.
> > +
>
> This documentation leaves me with a handful of questions about how it
> interacts with other options.

I have improved this doc in version 2. It now says:

--filter=<filter-spec>::
       Remove objects matching the filter specification from the
       resulting packfile and put them into a separate packfile. Note
       that objects used in the working directory are not filtered
       out. So for the split to fully work, it's best to perform it
       in a bare repo and to use the `-a` and `-d` options along with
       this option.  See linkgit:git-rev-list[1] for valid
       `<filter-spec>` forms.

For comparison the doc about --cruft is:

--cruft::
       Same as `-a`, unless `-d` is used. Then any unreachable objects
       are packed into a separate cruft pack. Unreachable objects can
       be pruned using the normal expiry rules with the next `git gc`
       invocation (see linkgit:git-gc[1]). Incompatible with `-k`.

> Here are some:
>
>   - What happens when you pass it with "-d"? Does it delete objects that
>     didn't match the filter? Leave them alone?

With or without "-d", no object is deleted, objects are just put into
2 different packfiles depending on whether or not they match the
filter.

> If the latter, should
>     this combination be declared invalid instead of silently ignoring
>     the user's request to delete redundant packs?

"-d" is indeed about removing redundant packs. If you want to split
the objects into different packs according to a filter, then you
probably don't want to keep packs that contain both kinds of objects
that you want to separate into different packs, so the doc now
recommends using "-d" (along with "-a") when using "--filter". That's
also what the tests are using.

I am not sure it's worth erroring out when "-d", or "-a", is not used.
Perhaps there are some use cases where it's interesting to keep old
packs, or, in case of not using "-a", to split only loose objects into
separate packs, but not objects from existing packs.

>   - What happens with --max-pack-size? Does the filtered pack get split
>     into multiple packs (as I think we would expect from such a
>     combination)?

Yes, in version 2 I have added a test (in the commit introducing
--filter-to) that checks that objects that are filtered out and that
are larger than --max-pack-size get split into their own packs.

>   - What about with `--cruft`? Does it split the cruft pack into two
>     based on whether or not the unreachable object(s) matched or didn't
>     match the filter?

I am not sure as I don't know well how cruft works. I would need to
check the code and/or test it.

>   - What happens when passed with "--geometric"? I don't think there is
>     a sensible interpretation (at least, I can't think of what it would
>     mean to do "--filter=<spec> --geometric=<factor>" off the top of my
>     head).

Not sure either, as I also don't know well how --geometric works.

>   - What about with "--write-bitmap-index"? Do we write one bitmap
>     index? Two? If the latter, do we combine the packs into a MIDX
>     before writing the bitmap? Should we?

In the tests, repack --filter is used with "-c
repack.writebitmaps=false" as otherwise the bitmap writing code tends
to complain that a pack is not complete or something like that. I
think it would make sense to have options that would make the bitmap
writing code write a regular bitmap index if all the objects are in a
single pack, but a MIDX if they are in multiple packs. I don't know
what --cruft or --max-pack-size are doing about this, but I think such
a feature could be useful in case those options are used.

In the meantime I would be Ok with disallowing using both
--write-bitmap-index and --filter. The issue is that writing a bitmap
index is the default behavior, even without --write-bitmap-index, so
it might be a bit more complex, but I plan to do something about it in
version 3.

> I think it may be worth spelling out answers to some of these questions
> in the documentation, and codifying those answers in the form of tests.

In version 2, the doc has been improved and now it seems comparable to
the --cruft doc. About the tests, I added one in version 2 related to
--max-pack-size, and I am open to adding a few others, but I don't
think it's a good idea to add a lot of them. It just doesn't scale
when commands have more than a few options.

The new code is reusing existing features (like --stdin-packs) that
are already used by other code (like --cruft code). It is not adding
new intricate mechanisms, nor a lot of new code. It is not changing
code used by other features except for a few clean refactorings. This
should give a good indication that it will work well with other
features.

> This makes me wonder whether or not this option should belong in repack
> at all, or whether there should be some new special-purpose builtin that
> is designed to split existing pack(s) based on whether or not they meet
> some filter criteria.

--cruft is also splitting existing packs based on some criteria. So if
we go this way, perhaps --cruft code should be moved first to this new
special purpose builtin?




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux