On Thu, Dec 14, 2023 at 3:05 AM Patrick Steinhardt <ps@xxxxxx> wrote: > On Wed, Dec 13, 2023 at 10:22:48PM -0500, Eric Sunshine wrote: > > This is an alternative solution to the issue Patrick's patch[1] > > addresses. Hopefully, this approach should avoid the sort of push-back > > Patrick's patch received[2]. > > > > By the way, I'm somewhat surprised that this issue crops up at all > > considering that --no-index is being used with git-diff. As such, I > > would have thought that the local repository's format would not have > > been interrogated at all. If that's a bug in `git diff --no-index`, then > > fixing that could be considered yet another alternative solution to the > > issue raised here. > > This strongly reminds me of the thread at [1], where a similar issue was > discussed for git-grep(1). Quoting Junio: > > > I actually do not think these "we are allowing Git tools to be used > > on random garbage" is a good idea to begin with X-<. If we invented > > something nice for our variant in "git grep" and wish we can use it > > outside the repository, contributing the feature to implementations > > of "grep" would have been the right way to move forward, instead of > > contaminating the codebase with things that are not related to Git. > > So this might not be the best way to go. I recall Junio mentioning that, and I'm fine with the conclusion that "fixing" --no-index is counter to the project's goals. > > - sed -e '/^[ ]*$$/d' chainlint/$$i.expect; \ > > + sed -e 's/[ ][ ]*/ /g;/^ *$$/d;s/^ //;s/ $$//;s/\([<>|();&]\) /\1/g;s/ \([<>|();&]\)/\1/g' chainlint/$$i.expect; \ > > > > The sed expressions for normalizing whitespace prior to `diff` may look > > a bit hairy, but they are simple enough in concept: > > > > * collapse runs of whitespace to a single SP > > * drop blank lines (this step is not new) > > * fold out possible SP at beginning and end of each line > > * fold out SP surrounding common punctuation characters used in shell > > scripts, such as `>`, `|`, `;`, etc. > > These sed expressions do look hairy indeed. I have to wonder: all that > we're doing here is to munge the expected files we already have in our > tree. Can't we fix those to look exactly like the actual results instead > and then avoid any kind of post processing altogether? If I understand > correctly the only reason we do this post processing is because the > original implementation of the chainlinter produced slightly different > whitespace. Yes and no. It's not just whitespace. I did strongly consider submitting patches to fix all the whitespace differences in the "expect" files when chainlint.pl replaced chainlint.sed, but I particularly didn't want to plague the mailing list with such noise. It's really just unnecessary churn since it's so easy to work around it with minor sed magic. And time tells me that that was probably the correct decision since the output of chainlint.pl has changed multiple times. Even the output of chainlint.sed wasn't necessarily stable[1]. Then, of course chainlint.pl replaced[2] chainlint.sed. The original implementation or chainlint.pl just dumped out the parsed token stream, but [3] improved it to preserve the original formatting of the test snippet, and [4] annotated the output with line numbers of the original test snippet. Had those changes been accompanied by extra patches to "fix" the "expect" files to suit, it would have been just that much more noise both in terms of patches to review and in terms of churn in the actual history. And, who knows, the output of chainlint.pl might change/improve again some day. So, I still favor using sed to smooth over these minor differences rather than "fixing" the "expect" file repeatedly to adjust them for changes which are not significant to what is actually being tested. [1]: d73f5cfa89 (chainlint.sed: stop splitting "(..." into separate lines "(" and "...", 2021-12-13) [2]: d00113ec34 (t/Makefile: apply chainlint.pl to existing self-tests, 2022-09-01) [3]: 73c768dae9 (chainlint: annotate original test definition rather than token stream, 2022-11-08) [4]: 48d69d8f2f (chainlint: prefix annotated test definition with line numbers, 2022-11-11)