[PATCH 0/4] --word-regex sanity checking and such

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Forgot the list and Matthijs on the first sending.  Sorry for the
spam!]

Scott Johnson wrote [trimmed and wraps fixed up]:
> Here's the default `git diff --word-diff`:
>         [-<li class="yws-maps"><em></em><a-]{+<li><em></em><a+} href="#">yws-maps</a></li>
>         [-<li class="ydn-delicious"><em></em><a-]{+<li><em></em><a+} href="#">ydn-delicious</a></li>
> 
> Which is correct, but less than ideal because it highlights much more than the 
> actual changes.
> 
> So I create a .gitattributes file with one line:
> *.html diff=html
> 
> And rerun `git diff --word-diff`:
>         <li[-class="yws-maps"-]><em></em><a href="#">yws-maps</a></li>
>         <li><em></em><a href="#">ydn-delicious</a></li>
> 
> Yikes! What happened to the second line of changes? The removed code is not 
> displayed at all.

Michael J Gruber wrote:
> Yep, I just found out myself experimenting with a wordRegex for csv.
> Seems like quite a "Gimme rope" feature...
> 
> So, it's the regex.

Well. Yes. No. Maybe.

Thanks for bringing this to my attention.  I currently have enough
more serious work to avoid that this actually motivated me to hack up
a sanity check.  It's just far too error prone as it is now.

But I cannot reproduce the problem!  I put Scott's two offending lines
(taken from his "straight" diff) into t4034/html/{pre,post}, and I
think the output is valid.  Also, the word regex for html has long
included the |[^[:space:]] safeguard (actually they all do except for
bibtex, which is even more lenient on what it matches).  So you either
found an example that depends on more context (which would be *really*
bad) or there is another source of bad regexes.  Anyway, the safeguard
should easily catch the latter case.

This did unearth a bug in the ruby regex, though, so it's been worth
the trouble.

Various small issues with this patch series:

* [4/4] I stole the html test from Scott's mail, and some of the rest
  from various Wikibooks sources on "Hello World" in each language,
  usually extended by a bit of code that tests the world-splitting
  power.  I hope this is ok with Scott and the Copyright overlords.
  There are only so many ways to spell "Hello World", and only so many
  languages I know myself.

* [4/4] Many patterns do not split 1+2, probably because they stick +2
  together as a signed integer literal, even though I think they
  should.  I ran out of time to investigate however.

* [3/4] was actually detected with the help of [4/4], but putting it
  after would require heavy special casing.

* [2/4] It's a weird idiosyncrasy of the word-diff code that the exit
  status of git-diff does not depend on whether word-diff found any
  differences, and in fact the shown hunks do not either.  So the
  tests are "test_must_fail" regardless of word regex, because the
  input files differ at a byte level.  Maybe at least hunks without
  word differences should be suppressed?


Thomas Rast (4):
  diff.c: pass struct diff_words into find_word_boundaries
  diff.c: implement a sanity check for word regexes
  userdiff: fix typo in ruby word regex
  t4034: bulk verify builtin word regex sanity

 Documentation/config.txt |    8 ++++
 diff.c                   |  104 +++++++++++++++++++++++++++++++++++++++++----
 diff.h                   |    1 +
 t/t4034-diff-words.sh    |   85 +++++++++++++++++++++++++++++++++++++-
 t/t4034/bibtex/expect    |   15 +++++++
 t/t4034/bibtex/post      |   10 ++++
 t/t4034/bibtex/pre       |    9 ++++
 t/t4034/cpp/expect       |   10 ++++
 t/t4034/cpp/post         |    5 ++
 t/t4034/cpp/pre          |    4 ++
 t/t4034/csharp/expect    |   12 +++++
 t/t4034/csharp/post      |    8 ++++
 t/t4034/csharp/pre       |    8 ++++
 t/t4034/fortran/expect   |   12 +++++
 t/t4034/fortran/post     |    7 +++
 t/t4034/fortran/pre      |    7 +++
 t/t4034/html/expect      |    7 +++
 t/t4034/html/post        |    2 +
 t/t4034/html/pre         |    2 +
 t/t4034/java/expect      |   11 +++++
 t/t4034/java/post        |    6 +++
 t/t4034/java/pre         |    6 +++
 t/t4034/objc/expect      |   11 +++++
 t/t4034/objc/post        |    7 +++
 t/t4034/objc/pre         |    7 +++
 t/t4034/pascal/expect    |   12 +++++
 t/t4034/pascal/post      |    7 +++
 t/t4034/pascal/pre       |    7 +++
 t/t4034/php/expect       |    7 +++
 t/t4034/php/post         |    2 +
 t/t4034/php/pre          |    2 +
 t/t4034/python/expect    |    8 ++++
 t/t4034/python/post      |    3 +
 t/t4034/python/pre       |    3 +
 t/t4034/ruby/expect      |    7 +++
 t/t4034/ruby/post        |    2 +
 t/t4034/ruby/pre         |    2 +
 t/t4034/tex/expect       |    9 ++++
 t/t4034/tex/post         |    4 ++
 t/t4034/tex/pre          |    4 ++
 userdiff.c               |    2 +-
 41 files changed, 433 insertions(+), 12 deletions(-)
 create mode 100644 t/t4034/bibtex/expect
 create mode 100644 t/t4034/bibtex/post
 create mode 100644 t/t4034/bibtex/pre
 create mode 100644 t/t4034/cpp/expect
 create mode 100644 t/t4034/cpp/post
 create mode 100644 t/t4034/cpp/pre
 create mode 100644 t/t4034/csharp/expect
 create mode 100644 t/t4034/csharp/post
 create mode 100644 t/t4034/csharp/pre
 create mode 100644 t/t4034/fortran/expect
 create mode 100644 t/t4034/fortran/post
 create mode 100644 t/t4034/fortran/pre
 create mode 100644 t/t4034/html/expect
 create mode 100644 t/t4034/html/post
 create mode 100644 t/t4034/html/pre
 create mode 100644 t/t4034/java/expect
 create mode 100644 t/t4034/java/post
 create mode 100644 t/t4034/java/pre
 create mode 100644 t/t4034/objc/expect
 create mode 100644 t/t4034/objc/post
 create mode 100644 t/t4034/objc/pre
 create mode 100644 t/t4034/pascal/expect
 create mode 100644 t/t4034/pascal/post
 create mode 100644 t/t4034/pascal/pre
 create mode 100644 t/t4034/php/expect
 create mode 100644 t/t4034/php/post
 create mode 100644 t/t4034/php/pre
 create mode 100644 t/t4034/python/expect
 create mode 100644 t/t4034/python/post
 create mode 100644 t/t4034/python/pre
 create mode 100644 t/t4034/ruby/expect
 create mode 100644 t/t4034/ruby/post
 create mode 100644 t/t4034/ruby/pre
 create mode 100644 t/t4034/tex/expect
 create mode 100644 t/t4034/tex/post
 create mode 100644 t/t4034/tex/pre

-- 
1.7.3.3.807.g6ee1f

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]