Re: [RFC PATCH 2/3] tmp-objdir: introduce `tmp_objdir_repack()`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 08, 2023 at 08:05:46AM +0100, Patrick Steinhardt wrote:
> > @@ -277,6 +278,18 @@ int tmp_objdir_migrate(struct tmp_objdir *t)
> >  	return ret;
> >  }
> >
> > +int tmp_objdir_repack(struct tmp_objdir *t)
> > +{
> > +	struct child_process cmd = CHILD_PROCESS_INIT;
> > +
> > +	cmd.git_cmd = 1;
> > +
> > +	strvec_pushl(&cmd.args, "repack", "-a", "-d", "-k", "-l", NULL);
> > +	strvec_pushv(&cmd.env, tmp_objdir_env(t));
>
> I wonder what performance of this repack would be like in a large
> repository with many refs. Ideally, I would expect that the repacking
> performance should scale with the number of objects we have written into
> the temporary object directory. But in practice, the repack will need to
> compute reachability and thus also scales with the size of the repo
> itself, doesn't it?

Good question. We definitely do not want to be doing an all-into-one
repack as a consequence of running 'git replay' in a large repository
with lots of refs, objects, or both.

But since we push the result of calling `tmp_objdir_env(t)` into the
environment of the child process, we are only repacking the objects in
the temporary directory, not the main object store.

I have a test that verifies this is the case by making sure that in a
repository with some arbitrary set of pre-existing packs, that only one
pack is added to that set after running 'replay', and that the
pre-existing packs remain in place.

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux