Re: [PATCH v0 3/3] Bigfile: teach "git add" to send a large file straight to a pack

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 09 May 2011 09:14:09 -0700

Junio C Hamano <gitster@xxxxxxxxx> writes:

> I envisioned that the "API" I talked about in the NEEDSWORK you quoted
> would keep an open file descriptor to the "currently being built" packfile
> wrapped in a "struct packed_git", with an in-core index_data that is
> adjusted every time you add a straight-to-pack kind of object. Upon a
> "finalize" call, it would determines the final pack name, write the real
> pack .idx file out, and rename the "being built" packfile to the final
> name to make it available to the outside world.
>
> Within a single git process that approach would give access to the set of
> objects that are going straight to the pack.  When it needs to spawn a git
> subprocess, it however would need to finalize the pack to give access to
> the new object, just like when fast-import flushes when asked to expose
> the marks.
>
> After all, this topic is about handling large binary files that would not
> fit in core at once (we do not support them now at all). It may not too
> bad to say we stuff one object per packfile and immediately close the
> packfile (which is what the use of fast-import by the POC patch
> does).

A (tentatively final) side note.

The primary reason why I wanted to think about using a single packfile
that is kept open and add multiple objects to the pack was because we may
later want to use this kind of set-up for "initial import", regardless of
the size of the object being added.  But now I think about it I do not
think that use case matters a lot.  The resulting single pack would have
much worse object density, compared to the case where you add them
normally, initially creating loose object files and then repack/gc at
which time you are likely to have more than one rev to sanely deltify.

Using one pack per large object while creating is not too bad to begin
with.  If you had a large enough core to hold such a large binary file,
the current system would store it as a single loose object file, so it is
not like we are making things any worse.  In either form, these "single
object per a file" initial storage will find their more permanent home
upon the first repack/gc.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html