Re: [PATCH 2/9] pack-objects: add `--print-filtered` to print omitted objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 21, 2023 at 12:52 PM Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> On Thu, Jun 15, 2023 at 03:50:17PM -0700, Junio C Hamano wrote:

> > Makes sense.  It is a bit sad that we have to accumulate everything
> > until the end at which time we have to dump the accumulated in bulk,
> > but that is a current limitation of list-objects-filter API and not
> > within the scope of this change.  We may in the longer term want to
> > see if we can make the collection of filtered-out objects streamable
> > by replacing the .omits object array with a callback function, or do
> > something along that line.
>
> Hmm. I think it is possible to use something like `git pack-objects`'s
> `--stdin-packs` mode to accomplish this without needing to keep track of
> the set of discarded objects (i.e. those which don't match the filter).
>
> IIUC, the set of objects which don't match the filter is the same as the
> set of all objects in packs beforehand, differenced with the set of
> objects that shows up in the pack containing objects which *do* match
> the filter.
>
> If you mark all of the "before" packs with `-` in the input to
> `--stdin-packs`, and then pass along the pack containing the filtered
> set without `-` (to indicate that the resulting pack should not contain
> any objects which appear in that pack), I think you would end up with
> the set of non-matching objects.

I agree that it can be done like this, but I am not sure it's very
efficient to do it like this. When we create the pack with filtered
out objects, we know the set of objects we filtered out, so it doesn't
seem efficient to make `git pack-objects --stdin-packs` read more
packfiles or their indexes than necessary and compute that set of
objects again.

Now I haven't checked if there is a real performance difference for
large packfiles, and perhaps `git pack-objects --stdin-packs` is very
efficient. But I hope that going the way I implemented it and perhaps
using some optimization ideas that Junio suggested above, will make it
easier to improve performance in the future.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux