On 4/2/2012 4:07 PM, Jeff King wrote:
...I think we need to first find out exactly
how well the generic algorithm can perform. It may be "good enough"
compared to the hassle that inconsistent application of a content-aware
algorithm will cause. So I wouldn't rule it out, but I'd rather try the
bup-style splitting first, and see how good (or bad) it is.
(I read bup DESIGN doc to see what bup-style splitting is.) When you use
bup delta technology in git.git I take it that you will use it for
big-worktree-files *and* big-history-files (not-big-worktree-files that
are not xdelta delta-friendly)? IOW, all binaries plus
big-text-worktree-files. Otherwise, small binaries will become large
histories.
If small binaries are not going to be bup-delta-compressed, then what
about using xxd to convert the binary to text and then xdelta
compressing the hex dump to achieve efficient delta compression in the
pack file? You could convert the hexdump back to binary with xxd for
checkout and such.
Maybe small binaries do xdelta well and the above is a moot point. This
is all theory to me, but the reality is looming over my head since most
of the components I should be tracking are binaries small (large
history?) and big (but am not yet because of "big-file" concerns -- I
don't want to have to refactor my vast git ecosystem with filter branch
later because I slammed binaries into the main project or superproject
without proper systems programming (I'm not sure what the c/linux term
is for 'systems programming', but in the mainframe world it meant making
sure everything was configured for efficient performance)).
Now that I say that out loud I guess a superproject with binaries in
separate repos could be easily refactored by creating new efficient
repos and making a new commit that points to them instead of the old
inefficient repos. That way, when someone checks out the binary repo
(submodule) into their worktree they get the new efficiency instead of
the old inefficiency. Over time, as folks are less likely to check out
old stuff the old inefficiency goes away on its own. I think.
(Submodules are mostly theory to me at this point also.)
v/r,
neal
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html