Re: [PATCH v2 05/11] commit-graph: examine changed-path objects in pack order

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/18/2020 12:59 PM, Jakub Narebski wrote:
> "Jeff King via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> 
>> From: Jeff King <peff@xxxxxxxx>
>>
>> Looking at the diff of commit objects in pack order is much faster than
>> in sha1 order, as it gives locality to the access of tree deltas
> 
> Nitpick: should we still say sha1 order?  Git is still using SHA-1 as an
> *oid*, but hopefully soon it will be transitioning to NewHash = SHA-256.
> (No need to change anything.)
> 
>> (whereas sha1 order is effectively random). Unfortunately the
>> commit-graph code sorts the commits (several times, sometimes as an oid
>> and sometimes a pointer-to-commit), and we ultimately traverse in sha1
>> order.
> 
> Actually, commit-graph code needs write_commit_graph_context.commits.list
> to be in lexicographical order to be able to turn position in graph into
> reference to a commit.  The information about the parents of the commit
> are stored using positional references within the graph file.
> 

You are right. Fixing the commit message in v3. 

>>
>> Instead, let's remember the position at which we see each commit, and
>> traverse in that order when looking at bloom filters. This drops my time
>> for "git commit-graph write --changed-paths" in linux.git from ~4
>> minutes to ~1.5 minutes.
> 
> Nitpick: with reordering of patches (which I think is otherwise a good
> thing) this patch actually comes before the one adding "--changed-paths"
> option to "git commit-graph write".  So it 'This would drop my time'
> rather than 'This drops my time...' ;-)
> 

:) I will fix that up. 

>>
>> Probably the "--reachable" code path would want something similar.
> 
> Has anyone tried doing this?
> 

I will and I will include the perf numbers in the appropriately in v3. 


>> +
>>  char *get_commit_graph_filename(const char *obj_dir)
>>  {
>>  	char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
>> @@ -1027,6 +1051,8 @@ static int add_packed_commits(const struct object_id *oid,
>>  	oidcpy(&(ctx->oids.list[ctx->oids.nr]), oid);
>>  	ctx->oids.nr++;
>>  
>> +	set_commit_pos(ctx->r, oid);
>> +
>>  	return 0;
>>  }
>>  
>> @@ -1147,6 +1173,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
>>  {
>>  	int i;
>>  	struct progress *progress = NULL;
>> +	struct commit **sorted_by_pos;
> 
> In the next patch in series we would sort commits by generation number
> and creation data; shouldn't this variable name be more generic to
> reflect this, for example just `sorted_commits` or `commits_sorted`?
> 

Good call. I will clean this up in both commits. 

Thanks for the review! 
Cheers! 
Garima Singh



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux