Re: [PATCH 0/9] Repack objects into separate packfiles based on a filter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
>> In some discussions, it was mentioned that such a feature, or a
>> similar feature in `git gc`, or in a new standalone command (perhaps
>> called `git prune-filtered`), should put the filtered out objects into
>> a new packfile instead of deleting them.
>>
>> Recently there were internal discussions at GitLab about either moving
>> blobs from inactive repos onto cheaper storage, or moving large blobs
>> onto cheaper storage. This lead us to rethink at repacking using a
>> filter, but moving the filtered out objects into a separate packfile
>> instead of deleting them.
>>
>> So here is a new patch series doing that while implementing the
>> `--filter=<filter-spec>` option in `git repack`.
>
> Very interesting idea, indeed, and would be very useful.
> Thanks.

Overall, I have a split feeling on the series.

One side of my brain thinks that the series does a very good job to
address the needs of those who want to partition their objects into
two classes, and the problem I saw in the series was mostly the way
it was sold (in other words, if it did not mention unbloating lazily
cloned repositories at all, I would have said "Yes!  It is an
excellent series.", and if it said "this mechanism is not meant to
be used to unbloat a lazily cloned repository, because the mechanism
does not distinguish objects that are only locally available and
objects that are retrievable from the promisor remotes, among those
that match the filter", it would have been even better)

To the other side of my brain, it smells as if the series wanted to
address the unbloating issue, but ended up with an unsatisfactory
solution, and used "partitioning objects in a full repository on the
server side " as an excuse for the resulting mechanism to still
exist, even though it is not usable for the original purpose.

Ideally, it would be great to have a mechanism that can be used for
both.  The "partitioning" can be treated as a degenerate case where
the repository does not have its upstream promisor (hence, any
object that match the filtering criteria can be excluded from the
primary pack because there are no "not available (yet) in our
promisor" objects), while the "unbloat" case can know who its
promisors are and ask the promisors what objects, among those that
match the filtering criteria, are still available from them to
exclude only those objects from the primary pack.

In the second ideal world, we may not be ready to tackle the
unbloating issue, but "partitioning" alone may still be a useful
feature.  In that case, perhaps the series can be salvaged by
updating how the feature is sold, with some comments indicating the
future direction to extend the mechanism later.

Thanks.







[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux