Re: [PATCH v2 4/9] list-objects-filter: implement composite filters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 6/6/2019 6:32 PM, Matthew DeVore wrote:
On Mon, Jun 03, 2019 at 05:51:28PM -0400, Jeff Hostetler wrote:
Since we are assuming 'compose' is an AND operation, there may be an
opportunity to short-cut some of this loop for blobs.  That is, if the
object is a blob and any filter rejects it, it is omitted, so we don't
need to keep looping for that object.  (Tree objects cannot be short-cut
this way because a tree may appear at different depths or in different
sparse "cones" and may have to be reconsidered.)

Blobs are also treated almost the same way as tree objects in tree:<depth>
filters - they can be included by tree:<depth> - so they also need to be
reconsidered when found at different depths.

But I agree it's always true that if some prior filter has excluded a blob, the
later filters don't even need to be *called at all* for that blob, unless
perhaps it's found under a different tree later. I also think it may be too
early to implement this optimization, since filter in a later release may just
want to "know" about a blob even if it must be excluded in the final result.

Does the optimization apply to trees as well? Does a tree:<depth> filter still
want to consider children of tree X if tree X has already been excluded by
another filter? If it doesn't want to consider, we can short-circuit the checks
very aggressively. If it does want to consider, we want the short-circuiting to
be customizable at least for trees.

A minor point - I don't think that short-circuiting the for loop (breaking out
early) is important, since it will be very rare that a combine: filter has more
than 4 or so sub-filters anyway. Calling the filter_fn implementation and
letting that do internal short-circuiting (informed by the previous filters'
results) can, however, skip a lot of computation.

So you could add an "affects blobs only" bit to the per-filter data
and try this out.  For example a "compose:blob:none+sparse:foo" should
perform better than "compose:sparse:foo+blob:none" but give the same
results.

Does "affects blobs only" mean the filter includes all non-blob objects?


I just meant that the blobs:none and blobs:limit filters give you a hard
omit.  Other filters later in the chain cannot change or override that
answer (because of the AND assumption); it doesn't matter how deep or
shallow the blob is the tree.

In the case of the tree:depth filter, a blob deep in the tree should
be provisionally omitted in case it appears later in a shallow tree
and should be included.  The tree filter can't do a hard omit on a blob
(just like it can't do a hard omit on a tree node).

WRT your question about a later filter "just wanting to know" about
a blob, I'm not sure.

So yeah, let's wait on this.  We can always add it later as an
optimization if/when it becomes a perf problem (and we have more
experience using them in practice).

Jeff





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux