On Fri, Mar 20, 2020 at 03:00:45PM -0700, Jonathan Tan wrote: > With 50033772d5 ("connected: verify promisor-ness of partial clone", > 2020-01-30), the fast path (checking promisor packs) in > check_connected() now passes a subset of the slow path (rev-list) - if > all objects to be checked are found in promisor packs, both the fast > path and the slow path will pass; otherwise, the fast path will > definitely not pass. This means that we can always attempt the fast path > whenever we need to do the slow path. > > The fast path is currently guarded by a flag; therefore, remove that > flag. Also, make the fast path fallback to the slow path - if the fast > path fails, the failing OID and all remaining OIDs will be passed to > rev-list. It looks like a pretty simple change. I had one probably-biased complaint about gotos below, otherwise it looks reasonable to me. > > The main user-visible benefit is the performance of fetch from a partial > clone - specifically, the speedup of the connectivity check done before > the fetch. In particular, a no-op fetch into a partial clone on my > computer was sped up from 7 seconds to 0.01 seconds. This is a > complement to the work in 2df1aa239c ("fetch: forgo full > connectivity check if --filter", 2020-01-30), which is the child of the > aforementioned 50033772d5. In that commit, the connectivity check > *after* the fetch was sped up. > > The addition of the fast path might cause performance reductions in > these cases: > > - If a partial clone or a fetch into a partial clone fails, Git will > fruitlessly run rev-list (it is expected that everything fetched > would go into promisor packs, so if that didn't happen, it is most > likely that rev-list will fail too). > > - Any connectivity checks done by receive-pack, in the (in my opinion, > unlikely) event that a partial clone serves receive-pack. > > I think that these cases are rare enough, and the performance reduction > in this case minor enough (additional object DB access), that the > benefit of avoiding a flag outweighs these. > > Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > --- > This is the second half of the work I did previously [1]. Quoting from > [1]: > > > For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of > > the remaining time is spent in yet another connectivity check > > (fetch_refs -> check_exist_and_connected) prior to the fetch - that will > > hopefully be done in a subsequent patch. > > This is the subsequent patch. (Note that the timings were done on > another computer, so don't compare the timings from [1] and this patch > directly.) > > [1] https://lore.kernel.org/git/be1d6aa4c4fd8868f3682b73c01a92d3830534ad.1578802317.git.jonathantanmy@xxxxxxxxxx/ > --- > builtin/clone.c | 7 ++----- > builtin/fetch.c | 7 ------- > connected.c | 9 +++++++-- > connected.h | 9 --------- > 4 files changed, 9 insertions(+), 23 deletions(-) > > diff --git a/builtin/clone.c b/builtin/clone.c > index 1ad26f4d8c..4b2b14ff61 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -672,8 +672,7 @@ static void update_remote_refs(const struct ref *refs, > const char *branch_top, > const char *msg, > struct transport *transport, > - int check_connectivity, > - int check_refs_are_promisor_objects_only) > + int check_connectivity) > { > const struct ref *rm = mapped_refs; > > @@ -682,8 +681,6 @@ static void update_remote_refs(const struct ref *refs, > > opt.transport = transport; > opt.progress = transport->progress; > - opt.check_refs_are_promisor_objects_only = > - !!check_refs_are_promisor_objects_only; > > if (check_connected(iterate_ref_map, &rm, &opt)) > die(_("remote did not send all necessary objects")); > @@ -1275,7 +1272,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix) > > update_remote_refs(refs, mapped_refs, remote_head_points_at, > branch_top.buf, reflog_msg.buf, transport, > - !is_local, filter_options.choice); > + !is_local); > > update_head(our_head_points_at, remote_head, reflog_msg.buf); > > diff --git a/builtin/fetch.c b/builtin/fetch.c > index bf6bab80fa..1097e1e512 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -908,13 +908,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, > if (!connectivity_checked) { > struct check_connected_options opt = CHECK_CONNECTED_INIT; > > - if (filter_options.choice) > - /* > - * Since a filter is specified, objects indirectly > - * referenced by refs are allowed to be absent. > - */ > - opt.check_refs_are_promisor_objects_only = 1; > - > rm = ref_map; > if (check_connected(iterate_ref_map, &rm, &opt)) { > rc = error(_("%s did not send all necessary objects\n"), url); > diff --git a/connected.c b/connected.c > index 7e9bd1bc62..846f2e4eef 100644 > --- a/connected.c > +++ b/connected.c > @@ -52,7 +52,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data, > strbuf_release(&idx_file); > } > > - if (opt->check_refs_are_promisor_objects_only) { > + if (has_promisor_remote()) { > /* > * For partial clones, we don't want to have to do a regular > * connectivity check because we have to enumerate and exclude > @@ -71,13 +71,18 @@ int check_connected(oid_iterate_fn fn, void *cb_data, > if (find_pack_entry_one(oid.hash, p)) > goto promisor_pack_found; > } > - return 1; > + /* > + * Fallback to rev-list with oid and the rest of the > + * object IDs provided by fn. > + */ > + goto no_promisor_pack_found; > promisor_pack_found: > ; > } while (!fn(cb_data, &oid)); > return 0; > } > > +no_promisor_pack_found: Having a look at the final structure of the loop with these gotos, I'm a little confused. Could be this isn't C-idiomatic but I think the code could be easier to read with helpers instead of gotos. I realize it's longer but I have a hard time understanding that your gotos are used to double-continue or double-break; nested loops tend to make me want to use helpers. But - I'm a lowly barely-reformed C++ developer, so what do I know ;) int oid_in_promisor(oid) { for (p = get_all_packs(the_repository); p; p = p->next) { if (!p->pack_promisor) continue; if (find_pack_entry_one(oid.hash, p) return 1; } } int all_oids_in_promisors(oid, fn, cb_data) { do { if (! oid_in_promisor(oid)) return 0; } while (!fn(cb_data, &oid)); return 1; } int check_connected(...) { ... if (has_promisor_remote()) { if (all_oids_in_promisors(oid, fn, cb_data)) return 0; if (opt->shallow_file) { ... } > if (opt->shallow_file) { > argv_array_push(&rev_list.args, "--shallow-file"); > argv_array_push(&rev_list.args, opt->shallow_file); > diff --git a/connected.h b/connected.h > index eba5c261ba..8d5a6b3ad6 100644 > --- a/connected.h > +++ b/connected.h > @@ -46,15 +46,6 @@ struct check_connected_options { > * during a fetch. > */ > unsigned is_deepening_fetch : 1; > - > - /* > - * If non-zero, only check that the top-level objects referenced by the > - * wanted refs (passed in as cb_data) are promisor objects. This is > - * useful for partial clones, where enumerating and excluding all > - * promisor objects is very slow and the commit-walk itself becomes a > - * no-op. > - */ > - unsigned check_refs_are_promisor_objects_only : 1; > }; > > #define CHECK_CONNECTED_INIT { 0 } > -- > 2.25.1.696.g5e7596f4ac-goog >