2018-03-09 3:42 GMT+09:00 Junio C Hamano <gitster@xxxxxxxxx>: > Takuto Ikuta <tikuta@xxxxxxxxxxxx> writes: >> This patch stores existing loose objects in hashmap beforehand and use >> it to check existence instead of using lstat. >> >> With this patch, the number of lstat calls in `git fetch` is reduced >> from 411412 to 13794 for chromium repository. >> >> I took time stat of `git fetch` disabling quickfetch for chromium >> repository 3 time on linux with SSD. > > Now you drop a clue that would help to fill in the blanks above, but > I am not sure what the significance of your having to disable > quickfetch in order to take measurements---it makes it sound as if > it is an articificial problem that does not exist in real life > (i.e. when quickfetch is not disabled), but I am getting the feeling > that it is not what you wanted to say here. > Yes, I just wanted to say 'git fetch' invokes fetch-pack. fetch-pack is skipped when running git fetch repeatedly while remote has no update by quickfetch. So I disabled it to see the performance of fetch-pack. In chromium repository, master branch is updated several times in an hour, so git fetch invokes fetch-pack in such frequency. > In any case, do_fetch_pack() tries to see if all of the tip commits > we are going to fetch exist locally, so when you are trying a fetch > that grabs huge number of refs (by the way, it means that the first > sentence of the proposed log message is not quite true---it is "When > fetching a large number of refs", as it does not matter how many > refs _we_ have, no?), everything_local() ends up making repeated > calls to has_object_file_with_flags() to all of the refs. > I fixed description by changing 'refs' to 'remote refs'. In my understanding, git tries to check existence of remote refs even if we won't fetch such refs. > I like the idea---this turns "for each of these many things, check > if it exists with lstat(2)" into "enumerate what exists with > lstat(2), and then use that for the existence test"; if you need to > try N objects for existence, and you only have M objects loose where > N is vastly larger than M, it will be a huge win. If you have very > many loose objects and checking only a handful of objects for > existence check, you would lose big, though, no? > Yes. But I think the default limit for the number of loose objects, 7000, gives us small overhead when we do enumeration of all objects. >> diff --git a/fetch-pack.c b/fetch-pack.c >> index d97461296..1658487f7 100644 >> --- a/fetch-pack.c >> +++ b/fetch-pack.c >> @@ -711,6 +711,15 @@ static void mark_alternate_complete(struct object *obj) >> mark_complete(&obj->oid); >> } >> >> +static int add_loose_objects_to_set(const struct object_id *oid, >> + const char *path, >> + void *data) >> +{ >> + struct oidset* set = (struct oidset*)(data); > > Style: in our codebase, asterisk does not stick to the type. > > struct oidset *set = (struct oidset *)(data); > >> @@ -719,16 +728,21 @@ static int everything_local(struct fetch_pack_args *args, >> int retval; >> int old_save_commit_buffer = save_commit_buffer; >> timestamp_t cutoff = 0; >> + struct oidset loose_oid_set = OIDSET_INIT; >> + >> + for_each_loose_object(add_loose_objects_to_set, &loose_oid_set, 0); > > OK, so this is the "enumerate all loose objects" phase. > >> save_commit_buffer = 0; >> >> for (ref = *refs; ref; ref = ref->next) { >> struct object *o; >> + unsigned int flag = OBJECT_INFO_QUICK; > > Hmm, OBJECT_INFO_QUICK optimization was added in dfdd4afc > ("sha1_file: teach sha1_object_info_extended more flags", > 2017-06-21), but since 8b4c0103 ("sha1_file: support lazily fetching > missing objects", 2017-12-08) it appears that passing > OBJECT_INFO_QUICK down the codepath does not do anything > interesting. Jonathan (cc'ed), are all remaining hits from "git > grep OBJECT_INFO_QUICK" all dead no-ops these days? > Yes the flag is no-op now, but let me untouched the flag in this patch. >> diff --git a/sha1_file.c b/sha1_file.c >> index 1b94f39c4..c903cbcec 100644 >> --- a/sha1_file.c >> +++ b/sha1_file.c >> @@ -1262,6 +1262,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, >> if (find_pack_entry(real, &e)) >> break; >> >> + if (flags & OBJECT_INFO_SKIP_LOOSE) >> + return -1; >> + > > I cannot quite convince myself that this is done at the right layer; > it smells to be at a bit too low a layer. This change makes sense > only to a caller that is interested in the existence test. If the > flag is named after what it does, i.e. "ignore loose object", then > it does sort-of make sense, though. > Couldn't come up with good alternative for this, I followed your flag name suggestion in patch v3. Takuto