On Wed, 27 Sep 2017 13:04:42 -0400 Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote: > The sparse filter is looking at pathnames and using the same rules > as sparse-checkout to decide which to *include* in the result. This > is essentially backwards from the other filters which are looking for > reasons to *exclude* a blob. If I see a {pathname, sha} pair and the > pathname is not wanted (by the sparse-checkout rules), I still don't > know anything about the object -- since the same SHA may appear later > in the treewalk but with a different pathname that *does* match the > patterns and *is* wanted. > > The net-net is that I have to mark the blob as "provisionally omitted" > until I've seen all the pathnames associated with it. Only then can I > say it should be omitted. How is this different from refraining from marking the blob as LOFR_MARK_SEEN? When you would provisionally omit the blob, return LOFR_ZERO so that a future iteration will revisit the blob again, and when you would include it in the output, return LOFR_MARK_SEEN|LOFR_SHOW. > Likewise, there are things about the tree object that we cannot > decide until we've seen all possible directory paths that reference it. > For example, if you rename a tree/directory between 2 commits, but make no > other changes within the directory, it will/should have the same SHA in the > second commit. If one of those paths is included in the sparse-checkout > and one is not, then you need include those blobs (and the tree object) > in the result. If the treewalk visits the excluded case first, you don't > want to discard the tree object (and shortcut future treewalks) because > the filter won't get a chance to see the included directory path case. For trees, I guess it's slightly different in that you do need an extra flag to keep track of whether the tree has been shown. So mark SHOWN and return LOFR_SHOW on the first time the tree is shown, and LOFR_ZERO otherwise. And trees must never be marked as LOFR_MARK_SEEN. (This SHOWN flag might play a similar role to your FILTER_REVISIT.) Until now, it seems to me that the _END event is not required. > Also, the current code does not attempt to omit tree objects, but a > future version may. And having the _BEGIN_ and _END_ events means the > filter can keep track of the nesting without the expense of having to > discover it by parsing the pathname looking for slashes as we do elsewhere. A feature that omits tree objects would need _END, true. But until then, I personally don't think we should add such infrastructure until we have a feature that needs it.