Part third (and last) of the review of v8 11/11. W dniu 20.09.2016 o 21:02, larsxschneider@xxxxxxxxx napisał: [...] > diff --git a/contrib/long-running-filter/example.pl b/contrib/long-running-filter/example.pl > new file mode 100755 > index 0000000..c13a631 > --- /dev/null > +++ b/contrib/long-running-filter/example.pl [...] > diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh > index dc50938..210c4f6 100755 > --- a/t/t0021-conversion.sh > +++ b/t/t0021-conversion.sh One thing that could have been done as yet another preparatory patch would be to modernize existing t/t0021-conversion.sh tests. For example use here-doc instead of series of echo-s, use cp to copy files and not echo, etc. > @@ -31,7 +31,10 @@ test_expect_success setup ' > cat test >test.i && > git add test test.t test.i && > rm -f test test.t test.i && > - git checkout -- test test.t test.i > + git checkout -- test test.t test.i && > + > + echo "content-test2" >test2.o && > + echo "content-test3 - subdir" >"test3 - subdir.o" I see that you prepare here a few uncommitted files, but both their names and their contents leave much to be desired - you don't know from the name and contents what they are for. And the '"subdir"' file which is not in subdirectory is especially egregious. > ' > > script='s/^\$Id: \([0-9a-f]*\) \$/\1/p' > @@ -279,4 +282,364 @@ test_expect_success 'diff does not reuse worktree files that need cleaning' ' > test_line_count = 0 count > ' > A small comment on parameters of this function would be nice. > +check_filter () { > + rm -f rot13-filter.log actual.log && > + "$@" 2> git_stderr.log && > + test_must_be_empty git_stderr.log && > + cat >expected.log && This is too clever by half. Having a function that both tests the behavior and prepares 'expected' file is too much. In my opinion preparation of 'expected.log' file should be moved to another function or functions. Also, if we are running sort on output, I think we should also run sort on 'expected.log', so that what we write doesn't need to be created sorted (so we don't have to sort expected lines by hand). Or maybe we should run the same transformation on rot13-filter.log and on the contents of expected.log. > + sort rot13-filter.log | uniq -c | sed "s/^[ ]*//" >actual.log && > + test_cmp expected.log actual.log > +} > + > +check_filter_count_clean () { > + rm -f rot13-filter.log actual.log && > + "$@" 2> git_stderr.log && > + test_must_be_empty git_stderr.log && All those functions (well, wait?) have common setup, which we can extract into separate shell function, I think. IMVHO. > + cat >expected.log && > + sort rot13-filter.log | uniq -c | sed "s/^[ ]*//" | > + sed "s/^\([0-9]\) IN: clean/x IN: clean/" >actual.log && > + test_cmp expected.log actual.log > +} > + > +check_filter_ignore_clean () { > + rm -f rot13-filter.log actual.log && > + "$@" && Why we don't check for stderr here? > + cat >expected.log && > + grep -v "IN: clean" rot13-filter.log >actual.log && > + test_cmp expected.log actual.log > +} > + > +check_filter_no_call () { > + rm -f rot13-filter.log && > + "$@" 2> git_stderr.log && > + test_must_be_empty git_stderr.log && > + test_must_be_empty rot13-filter.log > +} > + A small comment on parameters of this function would be nice. And a comment what it does. > +check_rot13 () { > + test_cmp "$1" "$2" && > + ./../rot13.sh <"$1" >expected && Why there is .. in this invocation? > + git cat-file blob :"$2" >actual && > + test_cmp expected actual > +} > + > +test_expect_success PERL 'required process filter should filter data' ' > + test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" && > + test_config_global filter.protocol.required true && > + rm -rf repo && > + mkdir repo && > + ( > + cd repo && > + git init && Don't you think that creating a fresh test repository for each separate test is a bit too much? I guess that you want for each and every test to be completely independent, but this setup and teardown is a bit excessive. Other tests in the same file (should we reuse the test, or use new test file) do not use this method. > + > + echo "*.r filter=protocol" >.gitattributes && > + git add . && > + git commit . -m "test commit" && > + git branch empty && Err... I think it would be better to name it 'empty-branch' (or 'almost-empty-branch', as it does include .gitattributes file). See my mistake below (marked <del>...</del>). > + > + cp ../test.o test.r && > + cp ../test2.o test2.r && What does this test2.o / test2.r file tests, that test.o / test.r doesn't? The name doesn't tell us. Why it is test.r, but test2.r? Why it isn't test1.r? > + mkdir testsubdir && > + cp "../test3 - subdir.o" "testsubdir/test3 - subdir.r" && Why it needs to have different contents? I guess that you test two things here: file in a subdirectory, and file with spaces in names. Shouldn't it be better split into two separate test files? > + >test4-empty.r && You test ordinary file, file in subdirectory, file with filename containing spaces, and an empty file. Other tests of single file `clean`/`smudge` filters use filename that requires mangling; maybe we should use similar file? special="name with '\''sq'\'' and \$x" && echo some test text >"$special" && In case of `process` filter, a special filename could look like this: process_special="name=with equals and\nembedded newlines\n" && echo some test text >"$process_special" && > + > + check_filter \ > + git add . \ I assume that this kind of test is here also to check that we are not regressing / backsliding, and we do not start to run "clean" operation more than once per file for "git add", isn't it? > + <<-\EOF && > + 1 IN: clean test.r 57 [OK] -- OUT: 57 . [OK] > + 1 IN: clean test2.r 14 [OK] -- OUT: 14 . [OK] > + 1 IN: clean test4-empty.r 0 [OK] -- OUT: 0 [OK] > + 1 IN: clean testsubdir/test3 - subdir.r 23 [OK] -- OUT: 23 . [OK] > + 1 START > + 1 STOP > + 1 wrote filter header > + EOF First, this indentation level confirms that the check_filter function is too clever by half, and that preparing expected.log file should be a separate step. Second, if we run "sort" on contents to be in expected.log, we can write it in more natural, and less fragile way: + sort >expected.log <<-\EOF && + 1 START + 1 wrote filter header + 1 IN: clean test.r 57 [OK] -- OUT: 57 . [OK] + 1 IN: clean test2.r 14 [OK] -- OUT: 14 . [OK] + 1 IN: clean test4-empty.r 0 [OK] -- OUT: 0 [OK] + 1 IN: clean testsubdir/test3 - subdir.r 23 [OK] -- OUT: 23 . [OK] + 1 STOP + EOF Third, why the filter even writes output size? It is no longer part of `process` filter driver protocol, and it makes test more fragile. If we are to keep sizes, then to make test less fragile with respect to changes in contents of tested files, we should use variables containing file size: test_r_size=$(wc -c test.r) ... sort >expected.log <<-EOF && ... 1 IN: clean test.r $test_r_size [OK] -- OUT: $test_r_size . [OK] > + > + check_filter_count_clean \ > + git commit . -m "test commit" \ I guess that you use "git commit ." (not very visible this '.') in order to force cleaning of all files, isn't it? Use of *_count_clean function is here, from what I remember, because 'git commit .' sometimes call `clean` multiple times for the same file (?), and sometimes it calls `smudge` (probably as part of some optimization?). I guess that fixing "git commit" so that calls clean operation at most once per file is left for a separate patch series; this one is long enough and involved enough as it is. > + <<-\EOF && > + x IN: clean test.r 57 [OK] -- OUT: 57 . [OK] > + x IN: clean test2.r 14 [OK] -- OUT: 14 . [OK] > + x IN: clean test4-empty.r 0 [OK] -- OUT: 0 [OK] > + x IN: clean testsubdir/test3 - subdir.r 23 [OK] -- OUT: 23 . [OK] > + 1 START > + 1 STOP > + 1 wrote filter header > + EOF > + > + rm -f test?.r "testsubdir/test3 - subdir.r" && Why 'test?.r' when we are removing only 'test2.r'; why not be explicit? > + > + check_filter_ignore_clean \ > + git checkout . \ > + <<-\EOF && > + START > + wrote filter header > + IN: smudge test2.r 14 [OK] -- OUT: 14 . [OK] > + IN: smudge testsubdir/test3 - subdir.r 23 [OK] -- OUT: 23 . [OK] Ah, I see that there are no shenningans for `clean` operation, calling op multiple time for single file. > + STOP > + EOF > + > + check_filter_ignore_clean \ > + git checkout empty \ <del> First, isn't it test4-empty.r? Trying to check out non-existent file should not run filter, isn't it? How the heck this passed??? There is no branch 'empty'. Second, the one-shot filter tests have empty-in-worktree and empty-in-repo files; why not reuse them? </del> My mistake, but the branch is named a bit strange. > + <<-\EOF && > + START > + wrote filter header > + STOP > + EOF Why is even filter process invoked? If this is not expected, perhaps simply ignore what checking out almost empty branch (one without any files marked for filtering) does. Shouldn't we test_expect_failure no-call? > + > + check_filter_ignore_clean \ > + git checkout master \ Does this checks different code path than 'git checkout .'? For example, does this test increase code coverage (e.g. as measured by gcov)? If not, then this test could be safely dropped. > + <<-\EOF && > + START > + wrote filter header > + IN: smudge test.r 57 [OK] -- OUT: 57 . [OK] > + IN: smudge test2.r 14 [OK] -- OUT: 14 . [OK] > + IN: smudge test4-empty.r 0 [OK] -- OUT: 0 [OK] > + IN: smudge testsubdir/test3 - subdir.r 23 [OK] -- OUT: 23 . [OK] Can we assume that Git would pass files to filter in alphabetical order? This assumption might make the test unnecessary fragile. > + STOP > + EOF > + > + check_rot13 ../test.o test.r && > + check_rot13 ../test2.o test2.r && > + check_rot13 "../test3 - subdir.o" "testsubdir/test3 - subdir.r" All right. > + ) > +' > + > +test_expect_success PERL 'required process filter should clean only and take precedence' ' Trying to describe it better results in overly long description, which probably means that this test should be split into few smaller ones: - `process` filter takes precedence over `clean` and/or `smudge` filters, regardless if it supports relevant ("clean" or "smudge") capability or not - `process` filter that includes only "clean" capability should clean only (be used only for 'clean' operation) - required process filter should do something (???) > + test_config_global filter.protocol.clean ./../rot13.sh && > + test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean" && > + test_config_global filter.protocol.required true && > + rm -rf repo && > + mkdir repo && > + ( > + cd repo && > + git init && > + > + echo "*.r filter=protocol" >.gitattributes && > + git add . && > + git commit . -m "test commit" && > + git branch empty && > + > + cp ../test.o test.r && > + > + check_filter \ > + git add . \ > + <<-\EOF && > + 1 IN: clean test.r 57 [OK] -- OUT: 57 . [OK] > + 1 START > + 1 STOP > + 1 wrote filter header > + EOF > + > + check_filter_count_clean \ > + git commit . -m "test commit" \ Is this part really necessary? I think it duplicates what we have tested earlier, and would not catch any new errors. Removing spurious/redundant tests results in faster testsuite, which is quite important. > + <<-\EOF > + x IN: clean test.r 57 [OK] -- OUT: 57 . [OK] > + 1 START > + 1 STOP > + 1 wrote filter header > + EOF And this test checks only the first one from the list. Well, actually the first part, without "regardless if it supports relevant ('clean' [...]) capability or not". > + ) > +' > + In my opinion all functions should be placed at beginning, or even in separate file (if they are used in more than one test). > +generate_test_data () { The name is not good, it doesn't describe what kind of data we want to generate. > + LEN=$1 > + NAME=$2 > + test-genrandom end $LEN | Why do you use 'end' as <seed_string> parameter to test-genrandom? > + perl -pe "s/./chr((ord($&) % 26) + 97)/sge" >../$NAME.file && Those constants (26 and 97) are a bit cryptic; magical constants. I guess this is + perl -pe "s/./chr((ord($&) % (ord('z') - ord('a') + 1) + ord('a'))/sge" >../$NAME.file && or + perl -pe "s/./chr((ord($&) % 26 + ord('a'))/sge" >../$NAME.file && That is, convert to a-z range (why not ASCII printable characters, that is characters from ' ' / chr(32) to '~' / chr(126), which is 95 characters instead of 26?) I guess this is so we can be sure that rot13 filter would work (note: the filter is defined for A-Za-z, not only a-z, never the mind pass-through for other characters). > + cp ../$NAME.file . && Do we re-generate this file each time? > + ./../rot13.sh <../$NAME.file >../$NAME.file.rot13 Anyway, I wonder if taking the last two lines out of the function (as they are not about _generating_ a file) would make it more readable or not. > +} > + > +test_expect_success PERL 'required process filter should process multiple packets' ' > + test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" && > + test_config_global filter.protocol.required true && > + > + rm -rf repo && > + mkdir repo && > + ( > + cd repo && > + git init && > + > + # Generate data that requires 3 packets > + PKTLINE_DATA_MAXLEN=65516 && Shouldn't this be set once per whole test? > + > + generate_test_data $(($PKTLINE_DATA_MAXLEN )) 1pkt_1__ && > + generate_test_data $(($PKTLINE_DATA_MAXLEN + 1)) 2pkt_1+1 && > + generate_test_data $(($PKTLINE_DATA_MAXLEN * 2 - 1)) 2pkt_2-1 && > + generate_test_data $(($PKTLINE_DATA_MAXLEN * 2 )) 2pkt_2__ && > + generate_test_data $(($PKTLINE_DATA_MAXLEN * 2 + 1)) 3pkt_2+1 && Looks good to me. > + > + echo "*.file filter=protocol" >.gitattributes && > + check_filter \ > + git add *.file .gitattributes \ Should it be shell expansion, or git expansion, that is git add '*.file' .gitattributes > + <<-\EOF && > + 1 IN: clean 1pkt_1__.file 65516 [OK] -- OUT: 65516 . [OK] > + 1 IN: clean 2pkt_1+1.file 65517 [OK] -- OUT: 65517 .. [OK] > + 1 IN: clean 2pkt_2-1.file 131031 [OK] -- OUT: 131031 .. [OK] > + 1 IN: clean 2pkt_2__.file 131032 [OK] -- OUT: 131032 .. [OK] > + 1 IN: clean 3pkt_2+1.file 131033 [OK] -- OUT: 131033 ... [OK] I think it would be better for those sizes to be calculated, not entered by hand. Though in this case this doesn't matter much - it would always be this size. > + 1 START > + 1 STOP > + 1 wrote filter header > + EOF > + git commit . -m "test commit" && Is this needed / necessary? > + > + rm -f *.file && > + git checkout -- *.file && Is this necessary? I guess this checks that it doesn't crash, but we do not check that smudge operation works correctly, as we did for clean. > + > + for f in *.file > + do > + git cat-file blob :$f >actual && > + test_cmp ../$f.rot13 actual > + done Wasn't there helper function for this? > + ) > +' > + > +test_expect_success PERL 'required process filter should with clean error should fail' ' ^^^^^^ ^^^^^^ Errr... what? You have 'should' twice here. Also, does it matter that the error is during clean operation? We don't test that error during smudge operation is handled in the same way, do we? > + test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" && Do we need to pass 'clean smudge', or does it provide both by default? > + test_config_global filter.protocol.required true && > + rm -rf repo && > + mkdir repo && > + ( > + cd repo && > + git init && > + > + echo "*.r filter=protocol" >.gitattributes && > + > + cp ../test.o test.r && > + echo "this is going to fail" >clean-write-fail.r && > + echo "content-test3-subdir" >test3.r && > + > + # Note: There are three clean paths in convert.c we just test one here. What does this comment is about? What 'three clean paths'? > + test_must_fail git add . > + ) > +' > + > +test_expect_success PERL 'process filter should restart after unexpected write failure' ' > + test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" && > + rm -rf repo && > + mkdir repo && > + ( > + cd repo && > + git init && > + > + echo "*.r filter=protocol" >.gitattributes && > + > + cp ../test.o test.r && > + cp ../test2.o test2.r && Note that the preparation step is almost the same, and we repeat it over, and over, and over (no shell function for this; and we always do full setup / teardown). > + echo "this is going to fail" >smudge-write-fail.o && > + cat smudge-write-fail.o >smudge-write-fail.r && This cat is cp. > + git add . && > + git commit . -m "test commit" && You don't need to commit for 'git checkout <path>' (e.g. for .) or 'git cat-file -p :<file>' to work. > + rm -f *.r && > + > + check_filter_ignore_clean \ > + git checkout . \ > + <<-\EOF && > + START > + wrote filter header > + IN: smudge smudge-write-fail.r 22 [OK] -- OUT: 22 [WRITE FAIL] > + START > + wrote filter header > + IN: smudge test.r 57 [OK] -- OUT: 57 . [OK] > + IN: smudge test2.r 14 [OK] -- OUT: 14 . [OK] > + STOP > + EOF > + > + check_rot13 ../test.o test.r && > + check_rot13 ../test2.o test2.r && Looks good. > + > + ! test_cmp smudge-write-fail.o smudge-write-fail.r && # Smudge failed! > + ./../rot13.sh <smudge-write-fail.o >expected && > + git cat-file blob :smudge-write-fail.r >actual && > + test_cmp expected actual # Clean worked! This is almost negation of check_rot13 - perhaps a helper function would help here (check_not_rot13?). Also, what this comment is about, and why so far to the right? > + ) > +' > + > +test_expect_success PERL 'process filter should not restart in case of an error' ' Errr... what? This description is not clear. Did you mean that filter should not be restarted if it *signals* an error with file (either before sending anything, or after sending partial contents)? > + test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" && > + rm -rf repo && > + mkdir repo && > + ( > + cd repo && > + git init && > + > + echo "*.r filter=protocol" >.gitattributes && > + > + cp ../test.o test.r && > + cp ../test2.o test2.r && > + echo "this will cause an error" >error.o && > + cp error.o error.r && And here you (correctly) use cp, and not cat. > + git add . && > + git commit . -m "test commit" && > + rm -f *.r && > + > + check_filter_ignore_clean \ > + git checkout . \ > + <<-\EOF && > + START > + wrote filter header > + IN: smudge error.r 25 [OK] -- OUT: 0 [ERROR] > + IN: smudge test.r 57 [OK] -- OUT: 57 . [OK] > + IN: smudge test2.r 14 [OK] -- OUT: 14 . [OK] > + STOP > + EOF > + > + check_rot13 ../test.o test.r && > + check_rot13 ../test2.o test2.r && > + test_cmp error.o error.r Looks good to me. > + ) > +' > + > +test_expect_success PERL 'process filter should be able to signal an error for all future files' ' Did you mean here that filter can abort processing of all future files? > + test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" && > + rm -rf repo && > + mkdir repo && > + ( > + cd repo && > + git init && > + > + echo "*.r filter=protocol" >.gitattributes && > + > + cp ../test.o test.r && > + cp ../test2.o test2.r && > + echo "error this blob and all future blobs" >abort.o && > + cp abort.o abort.r && > + git add . && > + git commit . -m "test commit" && > + rm -f *.r && > + > + check_filter_ignore_clean \ > + git checkout . \ > + <<-\EOF && > + START > + wrote filter header > + IN: smudge abort.r 37 [OK] -- OUT: 0 [ABORT] > + STOP How can we know that 'abort' file is processed first? Though more resilent solution would be harder to create... > + EOF > + > + test_cmp ../test.o test.r && > + test_cmp ../test2.o test2.r && > + test_cmp abort.o abort.r > + ) > +' > + > +test_expect_success PERL 'invalid process filter must fail (and not hang!)' ' > + test_config_global filter.protocol.process cat && We could use rot13.sh, that is one-shot filter here. > + test_config_global filter.protocol.required true && All right, filter is required to easier distinguish it not working from not filtered. > + rm -rf repo && > + mkdir repo && > + ( > + cd repo && > + git init && > + > + echo "*.r filter=protocol" >.gitattributes && > + > + cp ../test.o test.r && > + test_must_fail git add . 2> git_stderr.log && > + grep "not support long running filter protocol" git_stderr.log Shouldn't this use gettext poison (or rather C locale)? This error message could be translated in the future. > + ) > +' > + > test_done I wonder how does the code coverage for the new v2 filter code looks like... Anyway, I think it would be good idea to write at the beginning of new tests (be they in old test, or in new test) what we want to test: - that 'clean' and 'smudge' operations are invoked, for all possible combinations (covering all code paths), and that filter is invoked only once - that special types of files work: * empty file (in worktree, in index, in repo) * file in subdirectory * filename with special characters * large file (test marked as EXPENSIVE), multiple maximum packet size * perhaps binary file? - that 'process' overrides old-style 'clean' and 'smudge' filters, regardless of the former capabilities - that limiting capabilities works - that requiring filter works correctly (doubles number of tests, at least for a subset of them) - that filter is restarted if it fails on non-required, fails git command if required - that filter can error out out of filtering a file, upfront and after partial contents, without restart; fails git command if required - that filter can abort, fails git command if required (?) > diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl > new file mode 100755 > index 0000000..8958f71 > --- /dev/null > +++ b/t/t0021/rot13-filter.pl > @@ -0,0 +1,191 @@ > +#!/usr/bin/perl > +# > +# Example implementation for the Git filter protocol version 2 > +# See Documentation/gitattributes.txt, section "Filter Protocol" > +# > +# The script takes the list of supported protocol capabilities as > +# arguments ("clean", "smudge", etc). > +# > +# This implementation supports special test cases: > +# (1) If data with the pathname "clean-write-fail.r" is processed with > +# a "clean" operation then the write operation will die. > +# (2) If data with the pathname "smudge-write-fail.r" is processed with > +# a "smudge" operation then the write operation will die. > +# (3) If data with the pathname "error.r" is processed with any > +# operation then the filter signals that it cannot or does not want > +# to process the file. > +# (4) If data with the pathname "abort.r" is processed with any > +# operation then the filter signals that it cannot or does not want > +# to process the file and any file after that is processed with the > +# same command. Nice to have this description. BTW. why write-fail is per operation (clean or smudge), but error and abort is not? > +# > + > +use strict; > +use warnings; I guess there is some duplication with the code in contrib, isn't it? > + > +my $MAX_PACKET_CONTENT_SIZE = 65516; > +my @capabilities = @ARGV; > + > +open my $debug, ">>", "rot13-filter.log"; or die "cannot open file for appending: $!"; Good, three argument open. Bad (?), not error handling. > + > +sub rot13 { > + my ($str) = @_; ^^^^ Why 4 spaces, and not TAB character? I think my $str = shift; is more idiomatic Perl. > + $str =~ y/A-Za-z/N-ZA-Mn-za-m/; Why not use tr/// version of this quote-like operation? Or do you follow prior art here? > + return $str; > +} > + > +sub packet_bin_read { > + my $buffer; > + my $bytes_read = read STDIN, $buffer, 4; > + if ( $bytes_read == 0 ) { > + > + # EOF - Git stopped talking to us! > + print $debug "STOP\n"; > + exit(); > + } > + elsif ( $bytes_read != 4 ) { > + die "invalid packet size '$bytes_read' field"; Errr, $bytes_read is not packet size field. It is $buffer. Also, error message looks strange invalid packet size '004' field Shouldn't it be at end? > + } > + my $pkt_size = hex($buffer); $pkt_size greater than $MAX_PACKET_CONTENT_SIZE is also an error, as is sizes 1-3 (not that it matters much, at least here). > + if ( $pkt_size == 0 ) { > + return ( 1, "" ); It feels a bit strange to me to return list instead of hashref, but this is a matter of opinion. > + } > + elsif ( $pkt_size > 4 ) { > + my $content_size = $pkt_size - 4; > + $bytes_read = read STDIN, $buffer, $content_size; > + if ( $bytes_read != $content_size ) { > + die "invalid packet ($content_size expected; $bytes_read read)"; It would read, strangely "invalid packet (8 expected, 7 read)" The "size" or "bytes" is missing from this output. > + } > + return ( 0, $buffer ); > + } > + else { > + die "invalid packet size"; Is keep-alive packet valid ("0004")? > + } > +} > + > +sub packet_txt_read { > + my ( $res, $buf ) = packet_bin_read(); > + unless ( $buf =~ /\n$/ ) { > + die "A non-binary line SHOULD BE terminated by an LF."; First, if SHOULD BE, then perhaps 'warn' not 'die'... though for tests it is probably better to 'die'. Second, we should probably print (a fragment of) this line. > + } > + return ( $res, substr( $buf, 0, -1 ) ); Same comment as for example file in contrib/ - use s/// and no need for substr stuff. > +} > + > +sub packet_bin_write { > + my ($packet) = @_; > + print STDOUT sprintf( "%04x", length($packet) + 4 ); > + print STDOUT $packet; > + STDOUT->flush(); > +} > + > +sub packet_txt_write { > + packet_bin_write( $_[0] . "\n" ); > +} > + > +sub packet_flush { > + print STDOUT sprintf( "%04x", 0 ); > + STDOUT->flush(); > +} Looks good to me (though same comments as to contrib/ file applies). > + > +print $debug "START\n"; > +$debug->flush(); > + > +( packet_txt_read() eq ( 0, "git-filter-client" ) ) || die "bad initialize"; > +( packet_txt_read() eq ( 0, "version=2" ) ) || die "bad version"; > +( packet_bin_read() eq ( 1, "" ) ) || die "bad version end"; > + > +packet_txt_write("git-filter-server"); > +packet_txt_write("version=2"); > + > +( packet_txt_read() eq ( 0, "clean=true" ) ) || die "bad capability"; > +( packet_txt_read() eq ( 0, "smudge=true" ) ) || die "bad capability"; > +( packet_bin_read() eq ( 1, "" ) ) || die "bad capability end"; > + > +foreach (@capabilities) { > + packet_txt_write( $_ . "=true" ); > +} > +packet_flush(); > +print $debug "wrote filter header\n"; Or perhaps "handshake end"? > +$debug->flush(); > + > +while (1) { > + my ($command) = packet_txt_read() =~ /^command=([^=]+)$/; > + print $debug "IN: $command"; > + $debug->flush(); > + > + my ($pathname) = packet_txt_read() =~ /^pathname=([^=]+)$/; All right, here list context is necessary. > + print $debug " $pathname"; No " pathname=$pathname" ? > + $debug->flush(); > + > + # Flush > + packet_bin_read(); Same comment as earlier: read_flush, or read_varlist (till flush) to have would be better. > + > + my $input = ""; > + { > + binmode(STDIN); > + my $buffer; > + my $done = 0; > + while ( !$done ) { > + ( $done, $buffer ) = packet_bin_read(); > + $input .= $buffer; > + } > + print $debug " " . length($input) . " [OK] -- "; > + $debug->flush(); > + } > + > + my $output; > + if ( $pathname eq "error.r" or $pathname eq "abort.r" ) { > + $output = ""; > + } > + elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) { > + $output = rot13($input); > + } > + elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) { > + $output = rot13($input); > + } > + else { > + die "bad command '$command'"; > + } > + > + print $debug "OUT: " . length($output) . " "; Shouldn't we write the length of output only if we don't error out, abort, or fail? > + $debug->flush(); > + > + if ( $pathname eq "error.r" ) { > + print $debug "[ERROR]\n"; > + $debug->flush(); > + packet_txt_write("status=error"); > + packet_flush(); > + } > + elsif ( $pathname eq "abort.r" ) { > + print $debug "[ABORT]\n"; > + $debug->flush(); > + packet_txt_write("status=abort"); > + packet_flush(); > + } Looks good, so this is upfront status=error or status-abort. > + else { > + packet_txt_write("status=success"); > + packet_flush(); > + > + if ( $pathname eq "${command}-write-fail.r" ) { > + print $debug "[WRITE FAIL]\n"; > + $debug->flush(); > + die "${command} write error"; > + } > + > + while ( length($output) > 0 ) { > + my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE ); > + packet_bin_write($packet); > + print $debug "."; All right, so number of dots is the number of packets. This is surprisingly opaque. > + if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) { > + $output = substr( $output, $MAX_PACKET_CONTENT_SIZE ); > + } > + else { > + $output = ""; > + } > + } > + packet_flush(); > + print $debug " [OK]\n"; > + $debug->flush(); > + packet_flush(); Should we test partial contents case? Or failure during printing? What happens then - is file cleared by Git, or left partially converted? > + } > +} > Keep up good work. Looks quite good. -- Jakub Narębski