Re: [PATCH 03/13] list-objects: filter objects in traverse_commit_list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 9/26/2017 6:31 PM, Jonathan Tan wrote:
On Fri, 22 Sep 2017 20:26:22 +0000
Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote:

From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>

Create traverse_commit_list_filtered() and add filtering

You mention _filtered() here, but this patch contains _worker().

thanks.

  list-objects.h | 30 ++++++++++++++++++++++++++
  2 files changed, 80 insertions(+), 16 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index b3931fa..3e86008 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -13,10 +13,13 @@ static void process_blob(struct rev_info *revs,
  			 show_object_fn show,
  			 struct strbuf *path,
  			 const char *name,
-			 void *cb_data)
+			 void *cb_data,
+			 filter_object_fn filter,
+			 void *filter_data)

I had some thoughts about modifying "show" (instead of adding "filter",
as you have done in this patch) to indicate to the caller that the
corresponding object should not be marked "seen". That does have the
advantage that we don't have so many callbacks flying around, but it
also makes things more complicated, so I don't know which is better.

I wanted the filtering function to be independent of the showing function.
As you can see in pack-objects.c and rev-list.c:  All I needed to change
there was to call the new version of the traverse code and not alter the
show function.  The filtering is completely isolated inside the traverse
code.

It also means that the filtering code can do higher-level filtering.
Currently, the show-object callback gets called with each terminal node
(blob) and it has already lost any chance for tree-level optimizations.
The filtering code participates in the treewalk on the commit and can do
subtree elimination.

+	if (filter) {
+		r = filter(LOFT_END_TREE, obj, base->buf, &base->buf[baselen], filter_data);
+		if (r & LOFR_MARK_SEEN)
+			obj->flags |= SEEN;
+		if (r & LOFR_SHOW)
+			show(obj, base->buf, cb_data);
+	}

This LOFT_END_TREE was added to support the code in
list-objects-filter-sparse - would it be OK to completely remove the
optimization involving the "provisional" omission of blobs? (I don't
think the exact same tree object will occur multiple times often.) If
yes, then I think this block can be removed too and will simplify the
code.

The sparse filter is looking at pathnames and using the same rules
as sparse-checkout to decide which to *include* in the result.  This
is essentially backwards from the other filters which are looking for
reasons to *exclude* a blob.  If I see a {pathname, sha} pair and the
pathname is not wanted (by the sparse-checkout rules), I still don't
know anything about the object -- since the same SHA may appear later
in the treewalk but with a different pathname that *does* match the
patterns and *is* wanted.

The net-net is that I have to mark the blob as "provisionally omitted"
until I've seen all the pathnames associated with it.  Only then can I
say it should be omitted.

Likewise, there are things about the tree object that we cannot
decide until we've seen all possible directory paths that reference it.
For example, if you rename a tree/directory between 2 commits, but make no
other changes within the directory, it will/should have the same SHA in the
second commit.  If one of those paths is included in the sparse-checkout
and one is not, then you need include those blobs (and the tree object)
in the result.  If the treewalk visits the excluded case first, you don't
want to discard the tree object (and shortcut future treewalks) because
the filter won't get a chance to see the included directory path case.

Also, the current code does not attempt to omit tree objects, but a
future version may.  And having the _BEGIN_ and _END_ events means the
filter can keep track of the nesting without the expense of having to
discover it by parsing the pathname looking for slashes as we do elsewhere.


+/* See object.h and revision.h */
+#define FILTER_REVISIT (1<<25)

If you do add this, also update object.h. But I don't think this is the
right place to do it - it's probably better in
list-objects-filter-sparse.

+typedef enum list_objects_filter_result list_objects_filter_result;
+typedef enum list_objects_filter_type list_objects_filter_type;

I don't think Git style is to typedef enums.

+void traverse_commit_list_worker(
+	struct rev_info *,
+	show_commit_fn, show_object_fn, void *show_data,
+	filter_object_fn filter, void *filter_data);

Here (and throughout), as far as I can tell, the style is to indent to
the character after the opening parenthesis.


right. thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux