On Wed, Mar 10, 2021 at 04:39:22PM -0500, Jeff King wrote: > On Mon, Mar 01, 2021 at 01:20:26PM +0100, Patrick Steinhardt wrote: > > > Altogether, this ends up with the following queries, both of which have > > been executed in a well-packed linux.git repository: > > > > # Previous query which uses object names as a heuristic to filter > > # non-blob objects, which bars us from using bitmap indices because > > # they cannot print paths. > > $ time git rev-list --objects --filter=blob:limit=200 \ > > --object-names --all | sed -r '/^.{,41}$/d' | wc -l > > 4502300 > > > > real 1m23.872s > > user 1m30.076s > > sys 0m6.002s > > > > # New query. > > $ time git rev-list --objects --filter-provided \ > > --filter=object:type=blob --filter=blob:limit=200 \ > > --use-bitmap-index --all | wc -l > > 22585 > > > > real 0m19.216s > > user 0m16.768s > > sys 0m2.450s > > Those produce very different answers. I guess because in the first one, > you still have a bunch of tree objects, too. You'd do much better to get > the actual types from cat-file, and filter on that. That also lets you > use bitmaps for the traversal portion. E.g.: > > $ time git rev-list --use-bitmap-index --objects --filter=blob:limit=200 --all | > git cat-file --buffer --batch-check='%(objecttype) %(objectname)' | > perl -lne 'print $1 if /^blob (.*)/' | wc -l > 14966 > > real 0m6.248s > user 0m7.810s > sys 0m0.440s > > which is faster than what you showed above (this is on linux.git, but my > result is different; maybe you have more refs than me?). But we should > be able to do better purely internally, so I suspect my computer is just > faster (or maybe your extra refs just aren't well-covered by bitmaps). > Running with your patches I get: > > $ time git rev-list --objects --use-bitmap-index --all \ > --filter-provided --filter=object:type=blob \ > --filter=blob:limit=200 | wc -l > 16339 > > real 0m1.309s > user 0m1.234s > sys 0m0.079s > > which is indeed faster. It's quite curious that the answer is not the > same, though! I think yours has some bugs. If I sort and diff the > results, I see some commits mentioned in the output. Perhaps this is > --filter-provided not working, as they all seem to be ref tips. [snip] I've found the issue: when converting filters to a combined filter via `transform_to_combine_type()`, we reset the top-level filter via a call to `memset()`. So for combined filters, the option wouldn't have taken any effect because it got reset iff the `--filter-provided` option comes before the second filter. Patrick