Re: [PATCH 6/6] diff.c: detect blocks despite whitespace changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 27, 2017 at 10:06 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Junio C Hamano <gitster@xxxxxxxxx> writes:
>
>> Looking at the implementation of get_ws_cleaned_string() that is the
>> workhorse of emitted_symbol_cmp_no_ws(), it seems to be doing wrong
>> things for various "ignore whitespace" options (i.e. there is only
>> one implementation, while "git diff" family takes things like
>> "ignore space change", "ignore all whitespace", etc.), though.
>
> This probably deserves a bit more illustration of how I envision the
> code should evolve.
>
> In the longer term, I would prefer to see emitted_symbol_cmp_no_ws()
> to go and instead emitted_symbol_cmp() to take the diff options so
> that it can change the behaviour of the comparison function based on
> the -w/-b/--ignore-space-at-eol/etc. settings.  And compare two strings
> in place.
>
> For that, you may need a helper function that takes a pointer to a
> character pointer, picks the next byte that matters while advancing
> the pointer, and returns that byte.  The emitted_symbol_cmp(a, b)
> which is not used for real comparison (i.e. ordering to see if a
> sorts earlier than b) but for equivalence (i.e. considering various
> whitespace-ignoring settings, does a and b matfch?) may become
> something like:
>
>         int
>         emitted_symbol_eqv(struct emitted_diff_symbol *a,
>                            struct emitted_diff_symbol *b,
>                            const void *keydata) {
>                 struct diff_options *diffopt = keydata;
>                 const char *ap = a->line;
>                 const char *bp = b->line;
>
>                 while (1) {
>                         int ca, cb;
>                         ca = next_byte(&ap, diffopt);
>                         cb = next_byte(&bp, diffopt);
>                         if (ca != cb)
>                                 return 1; /* differs */
>                         if (!ca)
>                                 return 0;
>                 };
>         }
>
> where next_byte() may look like:
>
>         static int
>         next_byte(const char **cp, struct diff_options *diffopt)
>         {
>                 int retval;
>
>         again:
>                 retval = **cp;
>                 if (!retval)
>                         return retval; /* end of string */
>                 if (!isspace(retval)) {
>                         (*cp)++; /* advance */
>                         return retval;
>                 }
>
>                 switch (ignore_space_type(diffopt)) {
>                 case NOT_IGNORING:
>                         (*cp)++; /* advance */
>                         return retval;
>                 case IGNORE_SPACE_CHANGE:
>                         while (**cp && isspace(**cp))
>                                 (*cp)++; /* squash consecutive spaces */
>                         return ' '; /* normalize spaces with a SP */
>                 case IGNORE_ALL_SPACE:
>                         (*cp)++; /* advance */
>                         goto again;
>                 ... other cases here ...
>                 }
>         }
>
>

I just rebased the diff series on top of the hashmap series and now
I want to implement the compare function based on the new data
feature. So I thought I might start off this proposal, as you seem to
have good taste for how to approach problems.

It turns out that the code here is rather a very loose proposal,
not to be copied literally, because of these constraints:
* When dealing with user content, we do not have C-strings,
  but memory + length, such that next_byte also needs to be
  aware of the end pointer.
* The ignore_space_type() as well as these constants do not exist
  as-is, I think the cleanest for now is to parse diffopt->xdl_opts via
  DIFF_XDL_TST

Thanks,
Stefan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux