Re: [PATCH v2 3/3] builtin/diff-pairs: allow explicit diff queue flush

Justin Tobler <jltobler@xxxxxxxxx> · Wed, 19 Feb 2025 17:09:15 -0600

On 25/02/17 02:38PM, Phillip Wood wrote:
> Hi Justin
> 
> On 12/02/2025 04:18, Justin Tobler wrote:
> > The diffs queued from git-diff-pairs(1) stdin are not flushed EOF is
> > reached. To enable greater flexibility, allow control over when the diff
> > queue is flushed by writing a single nul byte on stdin between input
> > file pairs. Diff output between flushes is separated by a single line
> > terminator.
> 
> I agree with the comments others have made about the documentation. I also
> have some comments on the implementation below.
> 
> > diff --git a/builtin/diff-pairs.c b/builtin/diff-pairs.c
> > index 08f3ee81e5..2436ce3013 100644
> > --- a/builtin/diff-pairs.c
> > +++ b/builtin/diff-pairs.c
> > @@ -99,6 +99,17 @@ int cmd_diff_pairs(int argc, const char **argv, const char *prefix,
> >   			break;
> >   		p = meta.buf;
> > +		if (!*p) {
> > +			flush_diff_queue(&revs.diffopt);
> > +			/*
> > +			 * When the diff queue is explicitly flushed, append an
> > +			 * additional terminator to separate batches of diffs.
> > +			 */
> > +			fprintf(revs.diffopt.file, "%c",
> > +				revs.diffopt.line_termination);
> 
> As the user has requested an explicit flush we should call fflush(stdout)
> here to avoid deadlocking a caller that is waiting to read the terminator
> before writing the next batch of input. Ideally the tests would check that
> the output is flushed but I think that is quite hard to do with our test
> framework.

Good point, this needs to be explicitly flushed. Will fix.

> I think it would be easier for callers to parse the output if we always
> printed NUL here. Programming languages generally have a function that
> allows you to read all the input until a specific byte is seen. If flushing
> always used a NUL terminator the caller could use their equivalent of
> read_until(b'\0') to hoover up the output (using '-z' to do this would
> change the output of --numstat and embed a NUL between any stat data and the
> patch). Using a newline as the terminator here means the caller needs to
> look for "\n\n". That string occurs in the output between the stat data and
> the patch and can also occur in the patch hunks if diff.suppressBlankEmpty
> is set.

I was originally thinking that, without the -z option, a newline to
indicate separation between queued diff batches would be more
human-friendly. Always using a NUL byte would be more appropriate for
parsing though. I'll switch to using only a NUL byte here in the next
version.

> Now that we are calling diff_flush() in a loop we need to set .no_free in
> our diff options and call diff_free() at the end of the program (see the
> comment in diff.h)

Indeed, will fix!

Thanks
-Justin