Re: [PATCH] revision.c: reduce object database queries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 24, 2018 at 08:34:56PM -0500, Derrick Stolee wrote:

> In mark_parents_uninteresting(), we check for the existence of an
> object file to see if we should treat a commit as parsed. The result
> is to set the "parsed" bit on the commit.
> 
> Modify the condition to only check has_object_file() if the result
> would change the parsed bit.
> 
> When a local branch is different from its upstream ref, "git status"
> will compute ahead/behind counts. This uses paint_down_to_common()
> and hits mark_parents_uninteresting(). On a copy of the Linux repo
> with a local instance of "master" behind the remote branch
> "origin/master" by ~60,000 commits, we find the performance of
> "git status" went from 1.42 seconds to 1.32 seconds, for a relative
> difference of -7.0%.

This looks like an obvious and easy optimization, and I'm OK with this
patch.

But looking at the existing code, it makes me wonder a few things:

> diff --git a/revision.c b/revision.c
> index 5ce9b93..bc7def5 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -113,7 +113,8 @@ void mark_parents_uninteresting(struct commit *commit)
>  			 * it is popped next time around, we won't be trying
>  			 * to parse it and get an error.
>  			 */
> -			if (!has_object_file(&commit->object.oid))
> +			if (!commit->object.parsed &&
> +			    !has_object_file(&commit->object.oid))
>  				commit->object.parsed = 1;

We don't actually need the object contents at all right here. This is
just faking the "parsed" flag for later so that calls to parse_object()
don't barf.

This code comes originally form 454fbbcde3 (git-rev-list: allow missing
objects when the parent is marked UNINTERESTING, 2005-07-10). But later,
in aeeae1b771 (revision traversal: allow UNINTERESTING objects to be
missing, 2009-01-27), we marked dealt with calling parse_object() on the
parents more directly.

So what I wonder is whether this code is simply redundant and can go
away entirely. That would save the has_object_file() call in all cases.
We'd stop setting the fake commit->object.parsed flag, which in theory
means we'd make repeated failed calls to parse_commit_gently() if we
visited the same commit over and over. I wonder if add_parents_to_list()
should skip already-UNINTERESTING parents, which would mean we only try
to access each missing candidate once (OTOH missing ones are probably
rare enough that it doesn't make much difference either way).

Probably the fake commit->object.parsed flag isn't hurting anything in
practice, but it seems pretty nasty to lie about having loaded the
commit in the global store. E.g., imagine a follow-up traversal in the
same process in which that commit _isn't_ UNINTERESTING, and we'd
erroneously treat it as an available commit with no parents.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux