Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 24, 2017 at 05:18:36PM -0800, Jonathan Tan wrote:

> Whenever tree_objects is set to 1 in revision.h's struct rev_info,
> blob_objects is likewise set, and vice versa. Combine those two fields
> into one.
> 
> Some of the existing code does not handle tree_objects being different
> from blob_objects properly. For example, "handle_commit" in revision.c
> recurses from an UNINTERESTING tree into its subtree if tree_objects ==
> 1, completely ignoring blob_objects; it probably should still recurse if
> tree_objects == 0 and blob_objects == 1 (to mark the blobs), and should
> behave differently depending on blob_objects (controlling the
> instantiation and marking of blob objects). This commit resolves the
> issue by forbidding tree_objects from being different to blob_objects.

Yeah, I agree that is awkward. I'm OK with the rule "if blob_objects is
set, then tree_objects must also be set". It's the other way around I
care more about.

> It could be argued that in the future, Git might need to distinguish
> tree_objects from blob_objects - in particular, a user might want
> rev-list to print the trees but not the blobs. However, this results in
> a minor performance savings at best in that objects no longer need to be
> instantiated (causing memory allocations and hashtable insertions) - no
> disk reads are being done for objects whether blob_objects is set or
> not.

In a full object-graph traversal, we actually spend a big chunk of our
time in hash lookups. My measurements (admittedly from 2013, which I
haven't repeated lately) show something like a 20-25% speedup for this
case.

My only use for it (and the source of those timings) was to compute
archive reachability, which nobody seems to care too much about. But I
suspect we could speed up your case, too, when we are just computing the
reachability of a non-blob. I.e., you should be able to turn on the
smallest subset of "commits only", "commits and trees", and "commits,
trees, and blobs", based on what the other side has asked for.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]