On Tue, Nov 02, 2021 at 03:46:11PM +0000, Matt Cooper via GitGitGadget wrote: > From: Matt Cooper <vtbassmatt@xxxxxxxxx> > > The filter system allows for alterations to file contents when they're Some nit-picking: looking at https://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes we can read "...substitutions in files on commit/checkout." Should we use this wording here as well ? > moved between the database and the worktree. We already made sure that > it is possible for smudge filters to produce contents that are larger > than `unsigned long` can represent (which matters on systems where > `unsigned long` is narrower than `size_t`, most notably 64-bit Windows). > Now we make sure that clean filters can _consume_ contents that are > larger than that. > > Note that this commit only allows clean filters' _input_ to be larger > than can be represented by `unsigned long`. > > This change makes only a very minute dent into the much larger project > to teach Git to use `size_t` instead of `unsigned long` wherever > appropriate. > > Helped-by: Johannes Schindelin <johannes.schindelin@xxxxxx> > Signed-off-by: Matt Cooper <vtbassmatt@xxxxxxxxx> > Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> > --- > convert.c | 2 +- > t/t1051-large-conversion.sh | 11 +++++++++++ > 2 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/convert.c b/convert.c > index fd9c84b0257..5ad6dfc08a0 100644 > --- a/convert.c > +++ b/convert.c > @@ -613,7 +613,7 @@ static int crlf_to_worktree(const char *src, size_t len, struct strbuf *buf, > > struct filter_params { > const char *src; > - unsigned long size; > + size_t size; > int fd; > const char *cmd; > const char *path; > diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh > index e6d52f98b15..042b0e44292 100755 > --- a/t/t1051-large-conversion.sh > +++ b/t/t1051-large-conversion.sh > @@ -98,4 +98,15 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \ > test "$size" -eq $((5 * 1024 * 1024 * 1024 + $small_size)) > ' > > +# This clean filter writes down the size of input it receives. By checking against > +# the actual size, we ensure that cleaning doesn't mangle large files on 64-bit Windows. > +test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \ > + 'files over 4GB convert on input' ' > + test-tool genzeros $((5*1024*1024*1024)) >big && > + test_config filter.checklarge.clean "wc -c >big.size" && > + echo "big filter=checklarge" >.gitattributes && > + git add big && > + test $(test_file_size big) -eq $(cat big.size) > +' > + > test_done > -- > gitgitgadget