On Wed, Jun 21, 2023 at 1:17 PM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > On Wed, Jun 14, 2023 at 09:25:38PM +0200, Christian Couder wrote: > > --- > > Documentation/git-repack.txt | 5 +++ > > builtin/repack.c | 75 ++++++++++++++++++++++++++++++++++-- > > t/t7700-repack.sh | 16 ++++++++ > > 3 files changed, 93 insertions(+), 3 deletions(-) > > Having read through the implementation in the repack builtin, I am > almost certain that my suggestion earlier in the thread to implement > this in terms of 'git pack-objects --filter' and 'git pack-objects > --stdin-packs' would work. Yeah, it works, thanks. That's what is used in the version 2 I just sent. > > diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt > > index 4017157949..aa29c7e648 100644 > > --- a/Documentation/git-repack.txt > > +++ b/Documentation/git-repack.txt > > @@ -143,6 +143,11 @@ depth is 4095. > > a larger and slower repository; see the discussion in > > `pack.packSizeLimit`. > > > > +--filter=<filter-spec>:: > > + Remove objects matching the filter specification from the > > + resulting packfile and put them into a separate packfile. See > > + linkgit:git-rev-list[1] for valid `<filter-spec>` forms. > > + > > This documentation leaves me with a handful of questions about how it > interacts with other options. I have improved this doc in version 2. It now says: --filter=<filter-spec>:: Remove objects matching the filter specification from the resulting packfile and put them into a separate packfile. Note that objects used in the working directory are not filtered out. So for the split to fully work, it's best to perform it in a bare repo and to use the `-a` and `-d` options along with this option. See linkgit:git-rev-list[1] for valid `<filter-spec>` forms. For comparison the doc about --cruft is: --cruft:: Same as `-a`, unless `-d` is used. Then any unreachable objects are packed into a separate cruft pack. Unreachable objects can be pruned using the normal expiry rules with the next `git gc` invocation (see linkgit:git-gc[1]). Incompatible with `-k`. > Here are some: > > - What happens when you pass it with "-d"? Does it delete objects that > didn't match the filter? Leave them alone? With or without "-d", no object is deleted, objects are just put into 2 different packfiles depending on whether or not they match the filter. > If the latter, should > this combination be declared invalid instead of silently ignoring > the user's request to delete redundant packs? "-d" is indeed about removing redundant packs. If you want to split the objects into different packs according to a filter, then you probably don't want to keep packs that contain both kinds of objects that you want to separate into different packs, so the doc now recommends using "-d" (along with "-a") when using "--filter". That's also what the tests are using. I am not sure it's worth erroring out when "-d", or "-a", is not used. Perhaps there are some use cases where it's interesting to keep old packs, or, in case of not using "-a", to split only loose objects into separate packs, but not objects from existing packs. > - What happens with --max-pack-size? Does the filtered pack get split > into multiple packs (as I think we would expect from such a > combination)? Yes, in version 2 I have added a test (in the commit introducing --filter-to) that checks that objects that are filtered out and that are larger than --max-pack-size get split into their own packs. > - What about with `--cruft`? Does it split the cruft pack into two > based on whether or not the unreachable object(s) matched or didn't > match the filter? I am not sure as I don't know well how cruft works. I would need to check the code and/or test it. > - What happens when passed with "--geometric"? I don't think there is > a sensible interpretation (at least, I can't think of what it would > mean to do "--filter=<spec> --geometric=<factor>" off the top of my > head). Not sure either, as I also don't know well how --geometric works. > - What about with "--write-bitmap-index"? Do we write one bitmap > index? Two? If the latter, do we combine the packs into a MIDX > before writing the bitmap? Should we? In the tests, repack --filter is used with "-c repack.writebitmaps=false" as otherwise the bitmap writing code tends to complain that a pack is not complete or something like that. I think it would make sense to have options that would make the bitmap writing code write a regular bitmap index if all the objects are in a single pack, but a MIDX if they are in multiple packs. I don't know what --cruft or --max-pack-size are doing about this, but I think such a feature could be useful in case those options are used. In the meantime I would be Ok with disallowing using both --write-bitmap-index and --filter. The issue is that writing a bitmap index is the default behavior, even without --write-bitmap-index, so it might be a bit more complex, but I plan to do something about it in version 3. > I think it may be worth spelling out answers to some of these questions > in the documentation, and codifying those answers in the form of tests. In version 2, the doc has been improved and now it seems comparable to the --cruft doc. About the tests, I added one in version 2 related to --max-pack-size, and I am open to adding a few others, but I don't think it's a good idea to add a lot of them. It just doesn't scale when commands have more than a few options. The new code is reusing existing features (like --stdin-packs) that are already used by other code (like --cruft code). It is not adding new intricate mechanisms, nor a lot of new code. It is not changing code used by other features except for a few clean refactorings. This should give a good indication that it will work well with other features. > This makes me wonder whether or not this option should belong in repack > at all, or whether there should be some new special-purpose builtin that > is designed to split existing pack(s) based on whether or not they meet > some filter criteria. --cruft is also splitting existing packs based on some criteria. So if we go this way, perhaps --cruft code should be moved first to this new special purpose builtin?