Re: git diff looping?

Paolo Bonzini <paolo.bonzini@xxxxxxxxx> · Wed, 17 Jun 2009 10:46:21 +0200

Really, that performance is so bad that I'm beginning to wonder if I am
somehow measuring something wrong. How could they ship something so
crappy through so many versions?

Because without some care in the matcher, the regex can be exponential. 
This happens because you can backtrack arbitrarily from [A-Za-z_0-9]* 
into [A-Za-z_] and ironically it also causes the regex not to work as 
intended; for example "catch(" can match the complex part of the regex 
(e.g. the first repetition can be "c" and the second can be "atch".

We can make it faster and more correct at the expense of additional 
complication.

Starting from:

^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\([^;]*)$

we have to:

1) move [ \t] at the end of the repeated subexpression so that it 
removes the need for the [ \t] after

^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]*){2,}\([^;]*)$

2) make sure that at least one space/tab is eaten on all but the last 
occurrence of the repeated subexpression.  To this end the LHS of {2,} 
is duplicated, once with [ \t]+ and once with [ \t]*.  The repetition 
itself becomes a + since the last occurrence is now separately handled:

^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*
[ \t]*\([^;]*)$

Paolo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html