Re: [PATCH v4] userdiff: improve java hunk header regex

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Sixt <j6t@xxxxxxxx> writes:

Hi Hannes & Junio,

> These new tests are very much appreciated. You do not have to go wild
> with that many return type tests; IMO, the simple one and the most
> complicated one should do it. (And btw, s/cart/card/)

Well, they appeared naturally as a result during development and made it
easier to spot errors when you know up to which level of complexity it
still worked.  Is there a stronger reason to remove tests which might
not be needed, e.g., runtime cost on some CI machines?

>> -	 "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$",
>> +         "^[ \t]*("
>> +         /* Class, enum, and interface declarations: */
>> +         /*   optional modifiers: public */
>> +         "(([a-z]+[ \t]+)*"
>> +         /*   the kind of declaration */
>> +         "(class|enum|interface)[ \t]+"
>> +         /*   the name */
>> +         "[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)"
>> +         /* Method & constructor signatures: */
>> +         /*   optional modifiers: public static */
>> +         "|(([a-z]+[ \t]+)*"
>> +         /*   type params and return types for methods but not constructors */
>> +         "("
>> +         /*     optional type parameters: <A, B extends Comparable<B>> */
>> +         "(<[A-Za-z0-9_,.&<> \t]+>[ \t]+)?"
>> +         /*     return type: java.util.Map<A, B[]> or List<?> */
>> +         "([A-Za-z_]([A-Za-z_0-9<>,.?]|\\[[ \t]*\\])*[ \t]+)+"
>> +         /*   end of type params and return type */
>> +         ")?"
>> +         /*   the method name followed by the parameter list: myMethod(...) */
>> +         "[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)"
>> +         ")$",
>
> I don't see the point in this complicated regex. Please recall that it
> will be applied only to syntactically correct Java text. Therefore,
> you do not have to implement all syntactical corner cases, just be
> sufficiently permissive.

I actually find it easier to understand if it is broken up into more
concrete alternatives and parts which are commented instaed of one
opaque "permissively match everything in one alternative" regex.  It
shows the intent of what you want to match.  But YMMV and since Junio
agrees with you, I'm fine with that approach.

> What is wrong with
>
> 	"^[ \t]*(([A-Za-z_][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[
> \t]*\\([^;]*)$",

That doesn't work for

  <T> List<T> foo()

or

  <T extends Foo & Bar> T foo()

so at least it needs to include &<> in the first group, too.

Also, it doesn't match class/enum/interface declarations anymore, so

  class Foo {
    String x = "ChangeMe";
  }

will have an empty hunk header.

Another thing I've noticed (with my suggested patch) is that I should
not try to match constructor signatures.  I think that's impossible
because they are indistinguishable from method calls, e.g., in

  public class MyClass {
      MyClass(String RIGHT) {
          someMethodCall();
          someOtherMethod(17)
              .doThat();
          // Whatever
          // ChangeMe
      }
  }

there is no regex way to prefer MyClass(String RIGHT) over
someOtherMethod().

So all in all, I'd propose this version in the next patch version:

--8<---------------cut here---------------start------------->8---
PATTERNS("java",
	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
         "^[ \t]*("
         /* Class, enum, and interface declarations */
         "(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)"
         /* Method definitions; note that constructor signatures are not */
         /* matched because they are indistinguishable from method calls. */
         "|(([A-Za-z_<>&][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)"
         ")$",
	 /* -- */
	 "[a-zA-Z_][a-zA-Z0-9_]*"
	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
	 "|[-+*/<>%&^|=!]="
	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
--8<---------------cut here---------------end--------------->8---

That works for all my test cases (which I have also altered to include
the method calls from above before the ChangeMe) except for
java-constructor where it shows

  public class MyClass {

instead of

      MyClass(String RIGHT) {

in the hunk header which is expected as explained earlier and in the
comment.

Does that seem like a good middle ground?

Bye,
Tassilo



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux