On Fri, Feb 24, 2017 at 05:18:36PM -0800, Jonathan Tan wrote: > Whenever tree_objects is set to 1 in revision.h's struct rev_info, > blob_objects is likewise set, and vice versa. Combine those two fields > into one. > > Some of the existing code does not handle tree_objects being different > from blob_objects properly. For example, "handle_commit" in revision.c > recurses from an UNINTERESTING tree into its subtree if tree_objects == > 1, completely ignoring blob_objects; it probably should still recurse if > tree_objects == 0 and blob_objects == 1 (to mark the blobs), and should > behave differently depending on blob_objects (controlling the > instantiation and marking of blob objects). This commit resolves the > issue by forbidding tree_objects from being different to blob_objects. Yeah, I agree that is awkward. I'm OK with the rule "if blob_objects is set, then tree_objects must also be set". It's the other way around I care more about. > It could be argued that in the future, Git might need to distinguish > tree_objects from blob_objects - in particular, a user might want > rev-list to print the trees but not the blobs. However, this results in > a minor performance savings at best in that objects no longer need to be > instantiated (causing memory allocations and hashtable insertions) - no > disk reads are being done for objects whether blob_objects is set or > not. In a full object-graph traversal, we actually spend a big chunk of our time in hash lookups. My measurements (admittedly from 2013, which I haven't repeated lately) show something like a 20-25% speedup for this case. My only use for it (and the source of those timings) was to compute archive reachability, which nobody seems to care too much about. But I suspect we could speed up your case, too, when we are just computing the reachability of a non-blob. I.e., you should be able to turn on the smallest subset of "commits only", "commits and trees", and "commits, trees, and blobs", based on what the other side has asked for. -Peff