Re: git for game development?

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 24 Aug 2011 10:17:49 -0700

Jeff King <peff@xxxxxxxx> writes:

> I don't remember all of the details of bup, but if it's possible to
> implement something similar at a lower level (i.e., at the layer of
> packfiles or object storage), then it can be a purely local thing, and
> the compatibility issues can go away.

I tend to agree, and we might be closer than we realize.

I suspect that people with large binary assets were scared away by rumors
they heard second-hand, based on bad experiences other people had before
any of the recent efforts made in various "large Git" topics, and they
themselves haven't tried recent versions of Git enough to be able to tell
what the remaining pain points are. I wouldn't be surprised if none of the
core Git people tried shoving huge binary assets in test repositories with
recent versions of Git---I certainly haven't.

We used to always map the blob data as a whole for anything we do, but
these days, with changes like your abb371a (diff: don't retrieve binary
blobs for diffstat, 2011-02-19) and my recent "send large blob straight to
a new pack" and "stream large data out to the working tree without holding
everything in core while checking out" topics, I suspect that the support
for local usage of large blobs might be sufficiently better than the old
days. Git might even be usable locally without anything else, which I find
implausible, but I wouldn't be surprised if there remained only a handful
minor things remaining that we need to add to make it usable.

People toyed around with ideas to have a separate object store
representation for large and possibly incompressible blobs (a possible
complaint being that it is pointless to send them even to its own
packfile). One possible implementation would be to add a new huge
hierarchy under $GIT_DIR/objects/, compute the object name exactly the
same way for huge blobs as we normally would (i.e. hash concatenation of
object header and then contents) to decide which subdirectory under the
"huge" hierarchy to store the data (huge/[0-9a-f]{2}/[0-9a-f]{38}/ like we
do for loose objects, or perhaps huge/[0-9a-f]{40}/ expecting that there
won't be very many). The data can be stored unmodified as a file in that
directory, with type stored in a separate file---that way, we won't have
to compress, but we just copy. You still need to hash it at least once to
come up with the object name, but that is what gives us integrity checks,
is unavoidable and is not going to change.

The sha1_object_info() layer can learn to return the type and size from
such a representation, and you can further tweak the same places as the
"streaming checkout" and the "checkin to a pack" topics touched to support
such a representation.

I would suspect that the local object representation is _not_ the largest
pain point; such a separate object store representation is not buying us
very much over a simpler "single large blob in a separate packfile", and
if the counter-argument is "no, decompressing still costs a lot", then the
real issue might be we decompress and look at the data when we do not have
to (i.e. issues similar to what abb371a addressed), not "decompress vs
straight copy make a bit difference".

I would further suspect that we _might_ need a better support for local
repacking and object transfer, with or without such a third object
representation.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html