On Mon, May 15, 2017 at 9:14 PM, Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > Overall, this patch seems larger than it should to me, although there > might be good reasons for that that I don't know. I'll remark on what > I find unexpected. > >> >> +static void emit_buffered_patch_line_ws(struct diff_options *o, >> + struct buffered_patch_line *e, >> + const char *ws, unsigned ws_rule) > > This introduces a new _ws emission function - how is this used and how > is this different from the non-ws one? I see BPL_EMIT_LINE_WS, but I > don't see the caller that introduces that constant in this patch. See emit_line_ws, which makes use of BPL_EMIT_LINE_WS. The difference between BPL_EMIT_LINE_WS and BPL_EMIT_LINE_AS_IS is that _WS is emitted marking up whitespace differently (e.g. 8 continuous spaces instead of a tab or such), see the core.whitespace option. Relevant hunk: @@ -557,9 +616,12 @@ static void emit_line_ws(struct diff_options *o, const char *line, int len, const char *ws, unsigned ws_rule) { - emit_line_0(o, set, reset, sign, "", 0); - ws_check_emit(line, len, ws_rule, - o->file, set, reset, ws); + struct buffered_patch_line e = {set, reset, line, len, sign, BPL_EMIT_LINE_WS}; + + if (o->use_buffer) + append_buffered_patch_line(o, &e); + else + emit_buffered_patch_line_ws(o, &e, ws, ws_rule); } >> + switch (e->state) { >> + case BPL_EMIT_LINE_ASIS: >> + emit_buffered_patch_line(o, e); >> + break; >> + case BPL_EMIT_LINE_WS: >> + emit_buffered_patch_line_ws(o, e, ws, ws_rule); >> + break; >> + case BPL_HANDOVER: >> + o->current_filepair++; > > If we're just buffering the diff output, do we need to store > per-file-pair metadata? (I assume that's why you need a special > handover constant.) Clients can already read what they need from the > diff output. Currently we keep only whitespace settings per-file separate as they are defined per path (via attributes). So I read this comment as if you consider the per-file buffer unneeded and we could just detect the next file via the line <line-prefix>diff --git a/<file> b/<file> and then re-read the attributes and remember only the current files settings in the output pass. I'll look into that. >> + * >> + * TODO (later in this series): >> + * We'll unset this flag in a later patch. >> + */ >> + o->use_buffer = 1; > > What I would do is to add a demonstration patch at the end of the > patch series (which is not supposed to be queued) to avoid such churn > in history, but I'm not sure how the Git project prefers to do this. ok, I can omit this part in a reroll. >> + * >> + * NEEDSWORK: Instead of storing a copy of the line, add an offset pointer >> + * into the pre/post image file. This pointer could be a union with the >> + * line pointer. By storing an offset into the file instead of the literal line, >> + * we can decrease the memory footprint for the buffered output. At first we >> + * may want to only have indirection for the content lines, but we could >> + * also have an enum (based on sign?) that stores prefabricated lines, e.g. >> + * the similarity score line or hunk/file headers. > > This would be nice, but come to think of it, might not be possible. > When requesting --word-diff, control characters (or others) might > appear in the output, right? That is why we'd have even more states. ;) Or duplicate word diff still, but lines left intact are referenced via offsets. It's a hard problem to get right, so I defer it via this comment. > >> + */ >> +struct buffered_patch_line { >> + const char *set; >> + const char *reset; >> + const char *line; >> + int len; >> + int sign; >> + enum { >> + BPL_EMIT_LINE_WS, >> + BPL_EMIT_LINE_ASIS, >> + BPL_HANDOVER >> + } state; > > It might be better, for simplicity, just to have one big buffer > including everything (if we decide that we really can't add pointers > to input later). What do you mean here? (Drop the other structs such as the file pair?) Thanks for the review! Stefan