On 6/6/2019 6:32 PM, Matthew DeVore wrote:
On Mon, Jun 03, 2019 at 05:51:28PM -0400, Jeff Hostetler wrote:
Since we are assuming 'compose' is an AND operation, there may be an
opportunity to short-cut some of this loop for blobs. That is, if the
object is a blob and any filter rejects it, it is omitted, so we don't
need to keep looping for that object. (Tree objects cannot be short-cut
this way because a tree may appear at different depths or in different
sparse "cones" and may have to be reconsidered.)
Blobs are also treated almost the same way as tree objects in tree:<depth>
filters - they can be included by tree:<depth> - so they also need to be
reconsidered when found at different depths.
But I agree it's always true that if some prior filter has excluded a blob, the
later filters don't even need to be *called at all* for that blob, unless
perhaps it's found under a different tree later. I also think it may be too
early to implement this optimization, since filter in a later release may just
want to "know" about a blob even if it must be excluded in the final result.
Does the optimization apply to trees as well? Does a tree:<depth> filter still
want to consider children of tree X if tree X has already been excluded by
another filter? If it doesn't want to consider, we can short-circuit the checks
very aggressively. If it does want to consider, we want the short-circuiting to
be customizable at least for trees.
A minor point - I don't think that short-circuiting the for loop (breaking out
early) is important, since it will be very rare that a combine: filter has more
than 4 or so sub-filters anyway. Calling the filter_fn implementation and
letting that do internal short-circuiting (informed by the previous filters'
results) can, however, skip a lot of computation.
So you could add an "affects blobs only" bit to the per-filter data
and try this out. For example a "compose:blob:none+sparse:foo" should
perform better than "compose:sparse:foo+blob:none" but give the same
results.
Does "affects blobs only" mean the filter includes all non-blob objects?
I just meant that the blobs:none and blobs:limit filters give you a hard
omit. Other filters later in the chain cannot change or override that
answer (because of the AND assumption); it doesn't matter how deep or
shallow the blob is the tree.
In the case of the tree:depth filter, a blob deep in the tree should
be provisionally omitted in case it appears later in a shallow tree
and should be included. The tree filter can't do a hard omit on a blob
(just like it can't do a hard omit on a tree node).
WRT your question about a later filter "just wanting to know" about
a blob, I'm not sure.
So yeah, let's wait on this. We can always add it later as an
optimization if/when it becomes a perf problem (and we have more
experience using them in practice).
Jeff