On Thu, 22 Jun 2017 14:45:26 -0700 Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > On Thu, 22 Jun 2017 20:36:13 +0000 > Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote: > > > From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > > > > In preparation for partial/sparse clone/fetch where the > > server is allowed to omit large/all blobs from the packfile, > > teach traverse_commit_list() to take a blob filter-proc that > > controls when blobs are shown and marked as SEEN. > > > > Normally, traverse_commit_list() always marks visited blobs > > as SEEN, so that the show_object() callback will never see > > duplicates. Since a single blob OID may be referenced by > > multiple pathnames, we may not be able to decide if a blob > > should be excluded until later pathnames have been traversed. > > With the filter-proc, the automatic deduping is disabled. > > Comparing this approach (this patch and the next one) against mine [1], > I see that this has the advantage of (in pack-objects) avoiding the > invocation of add_preferred_base_object() on blobs that are filtered > out, avoiding polluting the "to_pack" data structure with information > that we do not need. > > But it does add an extra place where blobs are filtered out (besides > add_object_entry()), which makes it harder for the reader to keep track > of what's going on. I took a brief look to see if filtering could be > eliminated from add_object_entry(), but that function is called from > many places, so I couldn't tell. > > Anyway, I think both approaches will work (this patch's and mine [1]). > > [1] https://public-inbox.org/git/6f7934621717141ce3bb6bc05cf1d59c7900ccc5.1496432147.git.jonathantanmy@xxxxxxxxxx/ Also it should be mentioned somewhere that this does not cover the bitmap case - a similar discussion should be included in one of the patches, like I did in [1]. And looking back at my original cover letter [2], I wrote: > This is similar to [1] except that this [...] > is slightly more comprehensive in > that the read_object_list_from_stdin() codepath is also covered in > addition to the get_object_list() codepath. (Although, to be clear, > upload-pack always passes "--revs" and thus only needs the > get_object_list() codepath). (The [1] in the quote above refers to one of Jeff Hostetler's patches, [QUOTE 1].) The same issue applies to this patch (since read_object_list_from_stdin() invokes add_object_entry() directly without going through the traversal mechanism) and probably warrants at least some description in one of the commit messages. [1] https://public-inbox.org/git/6f7934621717141ce3bb6bc05cf1d59c7900ccc5.1496432147.git.jonathantanmy@xxxxxxxxxx/ [2] https://public-inbox.org/git/cover.1496361873.git.jonathantanmy@xxxxxxxxxx/ [QUOTE 1] https://public-inbox.org/git/1488994685-37403-3-git-send-email-jeffhost@xxxxxxxxxxxxx/