Re: Fwd: Git and Large Binaries: A Proposed Solution

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've been debating whether to resurrect this thread, but since it has been referenced by the SoC2011Ideas wiki article I will just go ahead.
I've spent a few hours trying to make this work to make git with big files usable under Windows.

> Just a quick aside.  Since (a2b665d, 2011-01-05) you can provide
> the filename as an argument to the filter script:
> 
>     git config --global filter.huge.clean huge-clean %f
> 
> then use it in place:
> 
>     $ cat >huge-clean 
>     #!/bin/sh
>     f="$1"
>     echo orig file is "$f" >&2
>     sha1=`sha1sum "$f" | cut -d' ' -f1`
>     cp "$f" /tmp/big_storage/$sha1
>     rm -f "$f"
>     echo $sha1
> 
> 		-- Pete

First off, the commit mentioned here is no help at all. This commit changes nothing about the input and output of filters. The file is still loaded completely into memory, still streamed to the filter via stdin, still streamed from the filter via stdout into yet another memory buffer. The two of which, IIRC, exist simultaneous for at least some time, thus doubling the memory requirements. This change only additionally provides the file name to the filter and nothing else. If one carefully rereads the commit message this apparently was the intention.

After this I started digging into the git source code. To change the filter input would be extremely trivial. However, the function that returns the filter output in a memory buffer is called from 8 places (all details from wetware memory and therefore unreliable). Most, maybe all, of the callers just dump the buffer into a file, which could easily be relocated into the filter calling function itself. But two callers detached the buffer from the strbuf and kept it beyond writing the file. I didn't track it any further since I decided to rather spend my time on improving big file handling in git itself, rather than targeting a workaround. Though of course a completely big-file-ready git should also provide a sane way to feed big files to and from filters.

If the two detached buffers are no complication this might be a trivial project. If they do it might become demanding though.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]