On Wed, Nov 08, 2023 at 08:05:46AM +0100, Patrick Steinhardt wrote: > > @@ -277,6 +278,18 @@ int tmp_objdir_migrate(struct tmp_objdir *t) > > return ret; > > } > > > > +int tmp_objdir_repack(struct tmp_objdir *t) > > +{ > > + struct child_process cmd = CHILD_PROCESS_INIT; > > + > > + cmd.git_cmd = 1; > > + > > + strvec_pushl(&cmd.args, "repack", "-a", "-d", "-k", "-l", NULL); > > + strvec_pushv(&cmd.env, tmp_objdir_env(t)); > > I wonder what performance of this repack would be like in a large > repository with many refs. Ideally, I would expect that the repacking > performance should scale with the number of objects we have written into > the temporary object directory. But in practice, the repack will need to > compute reachability and thus also scales with the size of the repo > itself, doesn't it? Good question. We definitely do not want to be doing an all-into-one repack as a consequence of running 'git replay' in a large repository with lots of refs, objects, or both. But since we push the result of calling `tmp_objdir_env(t)` into the environment of the child process, we are only repacking the objects in the temporary directory, not the main object store. I have a test that verifies this is the case by making sure that in a repository with some arbitrary set of pre-existing packs, that only one pack is added to that set after running 'replay', and that the pre-existing packs remain in place. Thanks, Taylor