Really, that performance is so bad that I'm beginning to wonder if I am
somehow measuring something wrong. How could they ship something so
crappy through so many versions?
Because without some care in the matcher, the regex can be exponential.
This happens because you can backtrack arbitrarily from [A-Za-z_0-9]*
into [A-Za-z_] and ironically it also causes the regex not to work as
intended; for example "catch(" can match the complex part of the regex
(e.g. the first repetition can be "c" and the second can be "atch".
We can make it faster and more correct at the expense of additional
complication.
Starting from:
^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\([^;]*)$
we have to:
1) move [ \t] at the end of the repeated subexpression so that it
removes the need for the [ \t] after
^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]*){2,}\([^;]*)$
2) make sure that at least one space/tab is eaten on all but the last
occurrence of the repeated subexpression. To this end the LHS of {2,}
is duplicated, once with [ \t]+ and once with [ \t]*. The repetition
itself becomes a + since the last occurrence is now separately handled:
^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*
[ \t]*\([^;]*)$
Paolo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html