Re: [PATCH 4/1] t3920: replace two cats with a tee

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Dec 03 2022, René Scharfe wrote:

> Am 03.12.22 um 13:53 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Sat, Dec 03 2022, René Scharfe wrote:
>>
>>> Am 03.12.22 um 06:09 schrieb Eric Sunshine:
>>>> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@xxxxxx> wrote:
>>>>> Use tee(1) to replace two calls of cat(1) for writing files with
>>>>> different line endings.  That's shorter and spawns less processes.
>>>>> [...]
>>>>> Signed-off-by: René Scharfe <l.s.r@xxxxxx>
>>>>> ---
>>>>> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
>>>>> @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES=""
>>>>>  create_crlf_ref () {
>>>>> -       cat >.crlf-orig-$branch.txt &&
>>>>> -       cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>>>> +       tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>>>
>>>> This feels slightly magical and more difficult to reason about than
>>>> using simple redirection to eliminate the second `cat`. Wouldn't this
>>>> work just as well?
>>>>
>>>>     cat >.crlf-orig-$branch.txt &&
>>>>     append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt &&
>>>
>>> It would work, of course, but this is the exact use case for tee(1).  No
>>> repetition, no extra redirection symbols, just an nicely fitting piece
>>> of pipework.  Don't fear the tee! ;-)
>>>
>>> (I'm delighted to learn from https://en.wikipedia.org/wiki/Tee_(command)
>>> that PowerShell has a tee command as well.)
>>
>> I don't really care, but I must say I agree with Eric here. Not having
>> surprising patterns in the test suite has a value of its own.
>
> That's a good general guideline, but I wouldn't have expected a pipe
> with three holes to startle anyone. *shrug*

It's more that you're used to seeing one thing, the "cat >in" at the
start of a function is a common pattern.

Then it takes some time to stop and grok an a new pattern. If I was
hacking on a function like that I'd probably stop to try to understand
"why", even though I understood the "what".

I'd then find it was to try to optimize things on Windows a bit... :)

I'm not saying it's not worth it in this case, just pointing out that
boring "standard" patterns have a value of their own in us collectively
understanding them, which has a value of its own. Whether optimizing a
test case outweighs that is another matter (sometimes it would).

>> In this case I wonder if you want to optimize this whether we couldn't
>> do much better with "test_commit_bulk", maybe by teaching it a small set
>> of new tricks.
>>
>> I.e. if I do:
>>
>> 	git fast-export --all
>>
>> At the end of the setup test it seems we just end up with refs with
>> names that correspond to their contents, and with double newlines in
>> them or whatever. This is a lot of "grep", "sed", "tr" etc. just to end
>> up with that.
>>
>> So maybe we can create them as a patch, possibly with some slight "sed"
>> munging on the input stream, just just teach it to accept a "ref prefix"
>> and "commit message contents". That could just be an argument that you
>> "$(printf "...")", so we don't even need a sub-process....
>
> The files are used later for verification, so their contents can't just
> be passed on via parameters.
>
> Had a similar idea and spent too much time on creating the four files in
> a single awk invocation.  The code was too verbose and yet hard to read
> for my taste.

Hah, I didn't try. Just a suggestion in case it made sense :)

>> Also this:
>>
>>      perl -wE 'say for 1..1024*100' | tee /tmp/x | perl -nE 'print "in: $_"; exit 1 if $_ == 512'; tail -n 1 /tmp/x
>>
>> Isn't deterministic. Now, in this case I doubt it matters, but it's nice
>> to have intermediate files in the test suite be determanistic, i.e. to
>> always have the full content be in the file at the top after the "top".
>
> Whoa, such a one-liner is a good argument for banishing Perl.
>
> So to rephrase it in a way that I can understand, you say that something
> like this:
>
> 	$ cd /tmp; seq 100000 | tee x | head -1 >/dev/null; wc -l x
>
> ... will probably report less than 100000 lines because the downpipe
> command ends the whole thing early.

Yes, the "perl" line was just a quick demo hack.

But the point is that the initial perl process on the LHS will be killed
with a SIGPIPE as the "perl" on the RHS stops and a SIGPIPE is
propagated up the chain.

I don't think it matters in this case, but just pointing out that it
*is* an edge case this sort of pattern introduces.

I've sometimes resorted to recursively diffing the trash directories of
two test runs to see if they're the same. E.g. I've caught cases where
the stderr of programs unexpectedly changes, but we had no test coverage
for it.

I think it's good to avoid patterns in general that make test runs
nondeterministic.

In this case it's only nondeterministic on failure, so it's probably
fine.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux