Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack`

Elijah Newren <newren@xxxxxxxxx> · Mon, 13 Nov 2023 19:08:52 -0800

On Mon, Nov 13, 2023 at 2:05 PM Jeff King <peff@xxxxxxxx> wrote:
>
> On Fri, Nov 10, 2023 at 08:24:44PM -0500, Taylor Blau wrote:
>
> > > This does mean that for a recursive merge, that you'll get up to 2*N
> > > packfiles, where N is the depth of the recursive merge.
> >
> > We definitely want to avoid that ;-). I think there are a couple of
> > potential directions forward here, but the most promising one I think is
> > due to Johannes who suggests that we write loose objects into a
> > temporary directory with a replace_tmp_objdir() call, and then repack
> > that side directory before migrating a single pack back into the main
> > object store.
>
> I posted an alternative in response to Elijah; the general idea being to
> allow the usual object-lookup code to access the in-progress pack. That
> would keep us limited to a single pack.
>
> It _might_ be a terrible idea. E.g., if you write a non-bulk object that
> references a bulk one, then that non-bulk one is broken from the
> perspective of other processes (until the bulk checkin is flushed). But
> I think we'd always be writing to one or the other here, never
> interleaving?

I think it sounds like a really cool idea, personally.  I also don't
see why any current uses would result in interleaving.

fast-import's created pack has the same kind of restriction...and has
even grown some extensions to work around the need for early access.
In particular, it has a `checkpoint` feature for flushing the current
pack and starting a new one, and also gained the `cat-blob` command
for when folks really wanted to access an object without writing out
the whole packfile.  So, _if_ the need for early access arises, we
have some prior art to look to for ways to provide it.