On Mon, Apr 12, 2021 at 07:47:41PM -0400, Jeff King wrote: > On Mon, Apr 12, 2021 at 11:36:53PM +0200, SZEDER Gábor wrote: > > > All what you wrote above makes sense to me, but I've never looked at > > how partial clones work, so it doesn't mean anything... In any case > > your patch brings great speedups, but, unfortunately, the memory usage > > remains as high as it was: > > > > $ /usr/bin/time --format=elapsed: %E max RSS: %Mk /home/szeder/src/git/bin-wrappers/git -C git-partial.git/ gc > > Enumerating objects: 188450, done. > > Counting objects: 100% (188450/188450), done. > > Delta compression using up to 4 threads > > Compressing objects: 100% (66623/66623), done. > > Writing objects: 100% (188450/188450), done. > > Total 188450 (delta 120109), reused 188450 (delta 120109), pack-reused 0 > > elapsed: 0:15.18 max RSS: 1888332k > > > > And git.git is not all that large, I wonder how much memory would be > > necessary to 'gc' a 'blob:none' clone of e.g. chromium?! :) > > > > BTW, this high memory usage in a partial clone is not specific to > > 'repack', 'fsck' suffers just as much: > > I think the issue is in the exclude-promisor-object code paths of the > traversal. Try this: > > [two clones of git.git, one full and one partial] > $ git clone --no-local --bare /path/to/git full.git > $ git clone --no-local --bare --filter=blob:none /path/to/git partial.git > > [full clone is quick to traverse all objects] > $ time git -C full.git rev-list --count --all > 63215 > real 0m0.369s > user 0m0.365s > sys 0m0.004s > > [partial is, too; it's the same amount of work because we're just > looking at the commits here] > $ time git -C partial.git rev-list --count --all > 63215 > real 0m0.373s > user 0m0.364s > sys 0m0.009s > > [but now ask it to exclude promisor objects, and it's much slower; > this is because is_promisor_object() opens up each tree in the pack in > order to see which "promised" objects it mentions] I don't understand this: 'git rev-list --count --all' only counts commit objects, so why should it open any trees at all? > $ time git -C partial.git rev-list --exclude-promisor-objects --count --all > 0 > real 0m11.723s > user 0m11.354s > sys 0m0.369s > > And I think that is the source for the memory use, too. Without that > option, we peak at 11MB heap. With it, 1.6GB. Oops. Looks like there > might be some "leaks" when we parse tree objects and then leave the > buffers connected to the structs (technically not a leak because we > still have the pointers, but obviously having every tree in memory at > once is not good). > > The patch below drops the peak heap to 165MB. Still quite a bit more, > but I think it's a combination of delta-base cache (96MB) plus extra > structs for all the non-commit objects whose flags we marked. > > It does seem like there's probably a good space/time tradeoff to be made > here (e.g., caching the list of "promisor" object ids for the pack > instead of inflating and reading all of the trees on the fly). > > diff --git a/packfile.c b/packfile.c > index 8668345d93..b79cbc8cd4 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -2247,6 +2247,7 @@ static int add_promisor_object(const struct object_id *oid, > return 0; > while (tree_entry_gently(&desc, &entry)) > oidset_insert(set, &entry.oid); > + free_tree_buffer(tree); > } else if (obj->type == OBJ_COMMIT) { > struct commit *commit = (struct commit *) obj; > struct commit_list *parents = commit->parents; > diff --git a/revision.c b/revision.c > index 553c0faa9b..fac2577748 100644 > --- a/revision.c > +++ b/revision.c > @@ -3271,8 +3271,15 @@ static int mark_uninteresting(const struct object_id *oid, > void *cb) > { > struct rev_info *revs = cb; > + /* > + * yikes, do we really need to parse here? maybe Heh, a "yikes" here and a "yuck" in your previous patch... This issue was worth reporting :) > + * lookup_unknown_object() would be sufficient, or > + * even oid_object_info() followed by the correct type > + */ > struct object *o = parse_object(revs->repo, oid); > o->flags |= UNINTERESTING | SEEN; > + if (o->type == OBJ_TREE) > + free_tree_buffer((struct tree *)o); > return 0; > } > > > -Peff