Re: [PATCH] fetch-pack.c: use oidset to check existence of loose object

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 09 Mar 2018 10:00:12 -0800

Takuto Ikuta <tikuta@xxxxxxxxxxxx> writes:

> Yes, I just wanted to say 'git fetch' invokes fetch-pack.
> fetch-pack is skipped when running git fetch repeatedly while
> remote has no update by quickfetch. So I disabled it to see the
> performance of fetch-pack. In chromium repository, master branch
> is updated several times in an hour, so git fetch invokes fetch-pack
> in such frequency.

I do understand that if you run "git fetch" against the same place
in quick succession three times, the first one may cost a lot (not
just that it has to do the everything_local() computation that you
care about, it also has to actually do the network transfer), while
the second and third ones that are done before anything new happens
on the other end will not involve everything_local() overhead thanks
to quickfetch.

A "fetch" that is run against a remote that has nothing new, but
still triggers everything_local() only because quickfetch is
disabled, is an artificial case that has no relevance to the real
world, I suspect, because the quickfetch optimization is to solve
the "there is nothing to be done, still do_fetch_pack() spends so
much cycles only to realize that it does not have anything to do,
why?" issue.

Isn't running the "git fetch" command with the "--dry-run" option
many times in quick succession a lot closer to what you really want
to measure, I wonder?  That way, your first fetch won't be touching
the state of the local side to affect your second and subsequent
fetches.

>> In any case, do_fetch_pack() tries to see if all of the tip commits
>> we are going to fetch exist locally, so when you are trying a fetch
>> that grabs huge number of refs (by the way, it means that the first
>> sentence of the proposed log message is not quite true---it is "When
>> fetching a large number of refs", as it does not matter how many
>> refs _we_ have, no?), everything_local() ends up making repeated
>> calls to has_object_file_with_flags() to all of the refs.
>
> I fixed description by changing 'refs' to 'remote refs'. In my understanding,
> git tries to check existence of remote refs even if we won't fetch such refs.

During fetch, everything_local() tries to mark common part by
walking the refs the other side advertised upon the first contact,
so it is correct that the number of checks is not reduced in a fetch
that does not fetch many refs, but the number of remote-tracking refs
you have has no effect, so I doubt such a rephrasing would make the
description more understandable.  "When fetching from a repository
with large number of refs" is probably what you want to say, no?

> Yes. But I think the default limit for the number of loose objects, 7000,
> gives us small overhead when we do enumeration of all objects.

Hmph, I didn't see the code that does the estimation of loose object
count before starting to enumerate, though.

>> Hmm, OBJECT_INFO_QUICK optimization was added in dfdd4afc
>> ("sha1_file: teach sha1_object_info_extended more flags",
>> 2017-06-21), but since 8b4c0103 ("sha1_file: support lazily fetching
>> missing objects", 2017-12-08) it appears that passing
>> OBJECT_INFO_QUICK down the codepath does not do anything
>> interesting.  Jonathan (cc'ed), are all remaining hits from "git
>> grep OBJECT_INFO_QUICK" all dead no-ops these days?
>
> Yes the flag is no-op now, but let me untouched the flag in this patch.

Yeah, I do not want you to be touching that in this change.  It is
an independent/orthogonal clean-up.

Thanks.