Re: [PATCH 1/3] list-objects: add filter_blob to traverse_commit_list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 6/22/2017 6:10 PM, Jonathan Tan wrote:
On Thu, 22 Jun 2017 14:45:26 -0700
Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:

On Thu, 22 Jun 2017 20:36:13 +0000
Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote:

From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>

In preparation for partial/sparse clone/fetch where the
server is allowed to omit large/all blobs from the packfile,
teach traverse_commit_list() to take a blob filter-proc that
controls when blobs are shown and marked as SEEN.

Normally, traverse_commit_list() always marks visited blobs
as SEEN, so that the show_object() callback will never see
duplicates.  Since a single blob OID may be referenced by
multiple pathnames, we may not be able to decide if a blob
should be excluded until later pathnames have been traversed.
With the filter-proc, the automatic deduping is disabled.

Comparing this approach (this patch and the next one) against mine [1],
I see that this has the advantage of (in pack-objects) avoiding the
invocation of add_preferred_base_object() on blobs that are filtered
out, avoiding polluting the "to_pack" data structure with information
that we do not need.

But it does add an extra place where blobs are filtered out (besides
add_object_entry()), which makes it harder for the reader to keep track
of what's going on. I took a brief look to see if filtering could be
eliminated from add_object_entry(), but that function is called from
many places, so I couldn't tell.

Anyway, I think both approaches will work (this patch's and mine [1]).

[1] https://public-inbox.org/git/6f7934621717141ce3bb6bc05cf1d59c7900ccc5.1496432147.git.jonathantanmy@xxxxxxxxxx/

Also it should be mentioned somewhere that this does not cover the
bitmap case - a similar discussion should be included in one of the
patches, like I did in [1].

And looking back at my original cover letter [2], I wrote:

This is similar to [1] except that this
[...]
is slightly more comprehensive in
that the read_object_list_from_stdin() codepath is also covered in
addition to the get_object_list() codepath. (Although, to be clear,
upload-pack always passes "--revs" and thus only needs the
get_object_list() codepath).

(The [1] in the quote above refers to one of Jeff Hostetler's patches,
[QUOTE 1].)

The same issue applies to this patch (since
read_object_list_from_stdin() invokes add_object_entry() directly
without going through the traversal mechanism) and probably warrants at
least some description in one of the commit messages.

[1] https://public-inbox.org/git/6f7934621717141ce3bb6bc05cf1d59c7900ccc5.1496432147.git.jonathantanmy@xxxxxxxxxx/
[2] https://public-inbox.org/git/cover.1496361873.git.jonathantanmy@xxxxxxxxxx/

[QUOTE 1] https://public-inbox.org/git/1488994685-37403-3-git-send-email-jeffhost@xxxxxxxxxxxxx/



Thanks for the quick feedback.  I'll dig into each of these comments
as I work on my next draft.

Jeff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux