Re: [PATCHv3 04/20] diff.c: teach emit_line_0 to accept sign parameter

Stefan Beller <sbeller@xxxxxxxxxx> · Mon, 22 May 2017 16:36:34 -0700

On Thu, May 18, 2017 at 4:33 PM, Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:

> I know I suggested the paragraph above, but after rereading your patch
> set, I think I finally understand what you're trying to accomplish.
> I think it's better to combine patches 4/20, 5/20, and 6/20, with the
> following commit message:

and patch 7/20 (diff.c: inline emit_line_0 into emit_line)

>   diff: introduce more flexible emit function
>
>   Currently, diff output is written either through the emit_line_0
>   function or through the FILE * in struct diff_options directly. To
>   make it easier to teach diff to buffer its output (which will be done
>   in a subsequent commit), introduce a more flexible emit() function. In
>   this commit, direct usages of emit_line_0() are replaced with emit();
>   subsequent commits will also replace usages of the FILE * with emit().

ok. sounds reasonable to me. I kept them separate for easier review
originally, but I can certainly combine them again.

>
> And the function itself can be documented this way (with the appropriate
> formatting):

I have some documentation in 19/20 in diff.h for the data structure stored.
We'd want to discuss where to put the documentation. I'd not like to add
such documentation to some static function in diff.c but rather only document
the data structure, and then (after in-lining <...>_0 to emit_line) keep the
function documentation rather short as:

/* see struct buffered_patch_line */
static void emit_line(...lots of args...)
{
    /* work on said data structure */
    ...

> If you do all that, then the buffering patch (19/20) can be improved by
> adding this comment somewhere in the file:
>
>   Buffer the diff output into ??? instead of immediately writing it to
>   "file".
>
>   NEEDSWORK: The contents of the ??? array - in particular, how the diff
>   output is divided into array elements - is not precisely defined; some
>   functions may emit a line all at once (resulting in one element)
>   whereas some others may emit a line piecemeal (resulting in more than
>   one element). Ideally, the code in this file should be structured so
>   that we do not have such imprecision, but in the meantime, callers
>   that request buffering should ensure that the diff output is divided
>   the way they expect (and have tests to ensure that it remains so).

I want to rename that struct to "buffered_diff_piece" or
"partial_diff_output", not implying we have a line or other another
specific piece.

Rethinking the discussion with Junio, what the correct abstraction level is,
we could even name it "one_color_piece", or "colored_diff_part", to
imply we'd want to have as much content as possible in the same
color. (And line endings always being in RESET color, we'd have a
natural separation at EOL)

>
>> With this patch other callers hard code the sign (which are '+', '-',
>> ' ' and '\\') such that we do not run into unexpectedly emitting an
>> erroneous '\0'.
>
> I still don't understand this paragraph - can you rewrite this in the
> imperative tense?

As in this patch the function signature is the same, but just changes
meaning of one of the arguments, one could imagine that there
could be a caller with first='\0' given some interesting data/mode of operation,
which would have instructed to literally output a '\0'. We would not do this
any more as the meaning changed.

However this hypothetical subtle bug was not introduced, because
there are no callers that call the function with a 'first' depending on
user provided data or otherwise hard-coded  expectation of a '\0' output.

Maybe I'll just drop this part.