Re: GSoC - Some questions on the idea of

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/2/2012 4:07 PM, Jeff King wrote:

...I think we need to first find out exactly
how well the generic algorithm can perform. It may be "good enough"
compared to the hassle that inconsistent application of a content-aware
algorithm will cause.  So I wouldn't rule it out, but I'd rather try the
bup-style splitting first, and see how good (or bad) it is.

(I read bup DESIGN doc to see what bup-style splitting is.) When you use bup delta technology in git.git I take it that you will use it for big-worktree-files *and* big-history-files (not-big-worktree-files that are not xdelta delta-friendly)? IOW, all binaries plus big-text-worktree-files. Otherwise, small binaries will become large histories.

If small binaries are not going to be bup-delta-compressed, then what about using xxd to convert the binary to text and then xdelta compressing the hex dump to achieve efficient delta compression in the pack file? You could convert the hexdump back to binary with xxd for checkout and such.

Maybe small binaries do xdelta well and the above is a moot point. This is all theory to me, but the reality is looming over my head since most of the components I should be tracking are binaries small (large history?) and big (but am not yet because of "big-file" concerns -- I don't want to have to refactor my vast git ecosystem with filter branch later because I slammed binaries into the main project or superproject without proper systems programming (I'm not sure what the c/linux term is for 'systems programming', but in the mainframe world it meant making sure everything was configured for efficient performance)).

Now that I say that out loud I guess a superproject with binaries in separate repos could be easily refactored by creating new efficient repos and making a new commit that points to them instead of the old inefficient repos. That way, when someone checks out the binary repo (submodule) into their worktree they get the new efficiency instead of the old inefficiency. Over time, as folks are less likely to check out old stuff the old inefficiency goes away on its own. I think. (Submodules are mostly theory to me at this point also.)

v/r,
neal
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]