Hi Junio & Philippe, On Wed, 24 Aug 2022, Junio C Hamano wrote: > Philippe Blain <levraiphilippeblain@xxxxxxxxx> writes: > > >>>> However, I've tried it on a more "real-life" example, and then I get: > >>>> > >>>> error: mismatched output from interactive.diffFilter > >>>> hint: Your filter must maintain a one-to-one correspondence > >>>> hint: between its input and output lines. > >>>> > >>>> This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should > >>>> keep the number of lines the same. > >>> > >>> Would you mind sharing the example with me? > >> > >> In trying to write a reproducer, I realize that this is in fact a separate "bug". > >> I happened to have a submodule in the repository I was testing, and it had modified > >> content. This is what was causing the "mismatched output" error, although I'm not > >> sure why. If I use a pathspec that does not include the submodule when invoking > >> 'git add -p', it works correctly. > >> > >> I'm a bit out of time now but I'll try to write a separate > >> reproducer for this later today. > > > > So in trying to write a new reproducer, I found the cause of the > > bug. ... Ah. Submodules. The gift that keeps on giving. I'm not _really_ complaining, though. It seems that with beautiful regularity, submodules come up as crucial parts of attack vectors in exploits that are reported to the git-security mailing list, requiring somebody with my abilities to implement fixes. Therefore, submodules award me with a sort of job security. :-) > I somehow had an impression that that the C reimplementation was > designed to be a bug-for-bug compatible faithful rewrite of the > original scripted version, Yes, it was designed as that, with a few things done differently, though, such as avoiding unnecessary `git diff` invocations, and using C-native data structures and memory management. One unfortunate consequence of avoiding the `git diff` invocations of `git-add--interactive.perl` that are not actually necessary is that I missed that one of those invocations specifically ignores dirty submodules, and that the Perl script then only processes files whose numstat shows up in _that_ diff output. Fortunately, the fix is easy: simply ignore dirty submodules in _all_ `git diff` invocations of `add -p`. I will submit the next iteration as soon as the PR builds pass. > but looking at the way how this bug is handled, I am not sure (it looks > as if the initial fix was to patch the code to match the symptom, not to > make the code to behave more faithfully to the scripted version). The built-in `add -p` does something slightly different from the Perl version when it comes to the line ranges in hunk headers: instead of storing stringified versions of them and parsing & re-rendering them when the hunks are modified, the C version stores the actual numbers, adjusts them as needed (e.g. when hunks are split), and, crucially, renders them on the fly when displaying the hunks. That means that also the colorized hunk headers are generated on the fly whenever the hunks are displayed, and they are never stored in rendered form, neither while parsing diffs nor when hunks are split or otherwise edited. That's why the built-in `add -p` looked for a `@@ ... @@` part in the colorized diff, so that it can display the remainder of that line after showing the rendered line range. And yes, this is a deviation from the bug-for-bug principle I follow for conversions from scripted commands to built-in C code, but it was for a good reason: memory management in C is less trivial than in Perl (where one essentially YOLOs memory management), and I was not prepared to invent a rope data structure just to be able to replace parts of a long string buffer with rendered text of a different length, nor was I prepared to call `memmove()` tons of times. Note that the Perl version does something slightly iffy, too: if a diff colorizer is configured, it initially shows the original hunk headers (e.g. "@ file:16 @" in the `diff-so-fancy` case) but when the hunk is modified in any way, it generates "@@ -1,6 +1,7 @@"-style hunk headers and does not show the ones generated by the diff colorizer _at all_ anymore. In this patch series, I teach the built-in `add -p` to be more lenient and simply show the entire original hunk header after the rendered line range if no `@@ ... @@` part was found. If that part was found, we still replace it with the rendered line range. Yes, it is a deviation from what the Perl version does. I consider it an improvement, though. > As with any big rewrite, a complete re-implementation always has risks > to introduce unintended regressions and that is why starting with > faithful rewrite with the staged transition plan is valuable. > > In this case, the staged transition plan, the step to flip the > default without before remove the old one, is working beautifully. > And it was only made possible with the actual users reporting issues > before they say "the new one is broken, so let's use the knob to > retain the old implementation". Indeed, and thank you, Philippe, for bringing this to the Git mailing list so that I could do something about the bug. Ciao, Dscho