Re: git for game development?

Michael Witten <mfwitten@xxxxxxxxx> · Sat, 27 Aug 2011 15:32:29 +0000

On Wed, Aug 24, 2011 at 17:17, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Jeff King <peff@xxxxxxxx> writes:
>
>> I don't remember all of the details of bup, but if it's possible to
>> implement something similar at a lower level (i.e., at the layer of
>> packfiles or object storage), then it can be a purely local thing, and
>> the compatibility issues can go away.
>
> I tend to agree, and we might be closer than we realize.
>
> I suspect that people with large binary assets were scared away by rumors
> they heard second-hand, based on bad experiences other people had before
> any of the recent efforts made in various "large Git" topics, and they
> themselves haven't tried recent versions of Git enough to be able to tell
> what the remaining pain points are. I wouldn't be surprised if none of the
> core Git people tried shoving huge binary assets in test repositories with
> recent versions of Git---I certainly haven't.
>
> We used to always map the blob data as a whole for anything we do, but
> these days, with changes like your abb371a (diff: don't retrieve binary
> blobs for diffstat, 2011-02-19) and my recent "send large blob straight to
> a new pack" and "stream large data out to the working tree without holding
> everything in core while checking out" topics, I suspect that the support
> for local usage of large blobs might be sufficiently better than the old
> days. Git might even be usable locally without anything else, which I find
> implausible, but I wouldn't be surprised if there remained only a handful
> minor things remaining that we need to add to make it usable.
>
> People toyed around with ideas to have a separate object store
> representation for large and possibly incompressible blobs (a possible
> complaint being that it is pointless to send them even to its own
> packfile). One possible implementation would be to add a new huge
> hierarchy under $GIT_DIR/objects/, compute the object name exactly the
> same way for huge blobs as we normally would (i.e. hash concatenation of
> object header and then contents) to decide which subdirectory under the
> "huge" hierarchy to store the data (huge/[0-9a-f]{2}/[0-9a-f]{38}/ like we
> do for loose objects, or perhaps huge/[0-9a-f]{40}/ expecting that there
> won't be very many). The data can be stored unmodified as a file in that
> directory, with type stored in a separate file---that way, we won't have
> to compress, but we just copy. You still need to hash it at least once to
> come up with the object name, but that is what gives us integrity checks,
> is unavoidable and is not going to change.
>
> The sha1_object_info() layer can learn to return the type and size from
> such a representation, and you can further tweak the same places as the
> "streaming checkout" and the "checkin to a pack" topics touched to support
> such a representation.
>
> I would suspect that the local object representation is _not_ the largest
> pain point; such a separate object store representation is not buying us
> very much over a simpler "single large blob in a separate packfile", and
> if the counter-argument is "no, decompressing still costs a lot", then the
> real issue might be we decompress and look at the data when we do not have
> to (i.e. issues similar to what abb371a addressed), not "decompress vs
> straight copy make a bit difference".

I've added Avery to the Cc list, because he really needs to chime in here.

I am completely unqualified to make a comment about this, but I think
that it would be silly to ignore the insights that Avery has about
storing large objects; `bup' uses rolling checksums and a `bloom
filter' implementation and who knows what else.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html