Re: [PATCH 2/6] quote_path: give flags parameter to quote_path()

Jeff King <peff@xxxxxxxx> · Thu, 10 Sep 2020 16:26:55 -0400

On Thu, Sep 10, 2020 at 08:17:32AM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@xxxxxxxxx> writes:
> 
> > Of course none of the above becomes unnecessary if we scan the whole
> > string for SP before the main loop in quote-c-style-counted, but the
> > function was written to process the input in a single pass and such
> > a change defeats its design.  If we need to do it in two passes, we
> > can have the caller do so anyway, at least for now.  That thinking
> > lead to the final organization of the series, with two steps that
> > used to be preparatory for passing the flag down thru to the bottom
> > layer rebased out as a discardable appendix at the end.
> 
> Actually, this made me realize that another variant is possible.
> It might be easier to read, or it might not.  Since I cannot tell
> without actually writing one, let's see ...

Vger seems to be delivering slowly and out-of-order the last day or two,
so I got rather confused to receive this after seeing your v2. :)

> I don't know if this is easier to follow or not.  I do think so
> right now but that is only because it is still fresh in my brain.

I do think it is easier to read than the original.

One minor nit with your analysis, though: the current code is actually
two-pass already. One pass finds the next quoted character, but then we
have to take another pass to copy it into place. That second pass can be
done with a memcpy(), which helps.

If you know you are quoting, you can do a true single-pass
character-by-character. But of course part of our task is to find out
_if_ we are quoting. And even if that were not the case, on modern
processors it is not always true that single-pass is going to be faster.
This code is definitely not such a hot-spot that it's worth doing that
kind of micro-optimization.

  Aside: We _do_ have spots where that is not true. When I looked at
  replacing xdiff's hash the sticking point was that we compute the
  newlines _and_ hash in a single pass. Most "fast" hash functions are
  optimized to take bigger sequences of data, but splitting out the
  newline-finding eliminated any gains.

-Peff