Re: [PATCH v3 4/8] t1051: introduce a smudge filter test for extremely large files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Matt Cooper via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Matt Cooper <vtbassmatt@xxxxxxxxx>
>
> The filter system allows for alterations to file contents when they're
> added to the database or workdir. ("Smudge" when moving to the workdir;
> "clean" when moving to the database.) This is used natively to handle CRLF
> to LF conversions. It's also employed by Git-LFS to replace large files
> from the workdir with small tracking files in the repo and vice versa.

Not a huge deal, but make it a habit to spell "working tree" not "workdir",
as someday you'd write end-user facing documentation in our tree ;-).
 
> Git pulls the entire smudged file into memory.

Giving "for what" would be helpful to readers.

    Git reads the entire smudged file into memory to convert it into
    a "clean" form to be used in-core.

> While this is inefficient,
> there's a more insidious problem on some platforms due to inconsistency
> between using unsigned long and size_t for the same type of data (size of
> a file in bytes). On most 64-bit platforms, unsigned long is 64 bits, and
> size_t is typedef'd to unsigned long. On Windows, however, unsigned long is
> only 32 bits (and therefore on 64-bit Windows, size_t is typedef'd to
> unsigned long long in order to be 64 bits).
>
> Practically speaking, this means 64-bit Windows users of Git-LFS can't
> handle files larger than 2^32 bytes. Other 64-bit platforms don't suffer
> this limitation.
>
> This commit introduces a test exposing the issue; future commits make it
> pass. The test simulates the way Git-LFS works by having a tiny file
> checked into the repository and expanding it to a huge file on checkout.
>
> Helped-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
> Signed-off-by: Matt Cooper <vtbassmatt@xxxxxxxxx>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
> ---
>  t/t1051-large-conversion.sh | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh
> index 8b7640b3ba8..bff86c13208 100755
> --- a/t/t1051-large-conversion.sh
> +++ b/t/t1051-large-conversion.sh
> @@ -83,4 +83,18 @@ test_expect_success 'ident converts on output' '
>  	test_cmp small.clean large.clean
>  '
>  
> +# This smudge filter prepends 5GB of zeros to the file it checks out. This
> +# ensures that smudging doesn't mangle large files on 64-bit Windows.
> +test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
> +		'files over 4GB convert on output' '
> +	test_commit test small "a small file" &&
> +	test_config filter.makelarge.smudge \
> +		"test-tool genzeros $((5*1024*1024*1024)) && cat" &&
> +	echo "small filter=makelarge" >.gitattributes &&
> +	rm small &&
> +	git checkout -- small &&
> +	size=$(test_file_size small) &&
> +	test "$size" -ge $((5 * 1024 * 1024 * 1024))
> +'

Why not exactly 5G, but anything that is at least 5G is OK?

Thanks.

>  test_done



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux