Re: [PATCH 4/1] t3920: replace two cats with a tee

René Scharfe <l.s.r@xxxxxx> · Sat, 3 Dec 2022 18:22:40 +0100

Am 03.12.22 um 13:53 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sat, Dec 03 2022, René Scharfe wrote:
>
>> Am 03.12.22 um 06:09 schrieb Eric Sunshine:
>>> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@xxxxxx> wrote:
>>>> Use tee(1) to replace two calls of cat(1) for writing files with
>>>> different line endings.  That's shorter and spawns less processes.
>>>> [...]
>>>> Signed-off-by: René Scharfe <l.s.r@xxxxxx>
>>>> ---
>>>> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
>>>> @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES=""
>>>>  create_crlf_ref () {
>>>> -       cat >.crlf-orig-$branch.txt &&
>>>> -       cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>>> +       tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>>
>>> This feels slightly magical and more difficult to reason about than
>>> using simple redirection to eliminate the second `cat`. Wouldn't this
>>> work just as well?
>>>
>>>     cat >.crlf-orig-$branch.txt &&
>>>     append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt &&
>>
>> It would work, of course, but this is the exact use case for tee(1).  No
>> repetition, no extra redirection symbols, just an nicely fitting piece
>> of pipework.  Don't fear the tee! ;-)
>>
>> (I'm delighted to learn from https://en.wikipedia.org/wiki/Tee_(command)
>> that PowerShell has a tee command as well.)
>
> I don't really care, but I must say I agree with Eric here. Not having
> surprising patterns in the test suite has a value of its own.

That's a good general guideline, but I wouldn't have expected a pipe
with three holes to startle anyone. *shrug*

> In this case I wonder if you want to optimize this whether we couldn't
> do much better with "test_commit_bulk", maybe by teaching it a small set
> of new tricks.
>
> I.e. if I do:
>
> 	git fast-export --all
>
> At the end of the setup test it seems we just end up with refs with
> names that correspond to their contents, and with double newlines in
> them or whatever. This is a lot of "grep", "sed", "tr" etc. just to end
> up with that.
>
> So maybe we can create them as a patch, possibly with some slight "sed"
> munging on the input stream, just just teach it to accept a "ref prefix"
> and "commit message contents". That could just be an argument that you
> "$(printf "...")", so we don't even need a sub-process....

The files are used later for verification, so their contents can't just
be passed on via parameters.

Had a similar idea and spent too much time on creating the four files in
a single awk invocation.  The code was too verbose and yet hard to read
for my taste.

> Also this:
>
>      perl -wE 'say for 1..1024*100' | tee /tmp/x | perl -nE 'print "in: $_"; exit 1 if $_ == 512'; tail -n 1 /tmp/x
>
> Isn't deterministic. Now, in this case I doubt it matters, but it's nice
> to have intermediate files in the test suite be determanistic, i.e. to
> always have the full content be in the file at the top after the "top".

Whoa, such a one-liner is a good argument for banishing Perl.

So to rephrase it in a way that I can understand, you say that something
like this:

	$ cd /tmp; seq 100000 | tee x | head -1 >/dev/null; wc -l x

... will probably report less than 100000 lines because the downpipe
command ends the whole thing early.

> With a "tee" you need to worry about the "append_cr" function it's being
> piped in stopping the stdin.
>
> I don't think it matters in this case, but in general as a pattern: I do
> fear the "tee" a bit :)

Right, append_cr reads until EOF.

René