Re: git diff looping?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Really, that performance is so bad that I'm beginning to wonder if I am
somehow measuring something wrong. How could they ship something so
crappy through so many versions?

Because without some care in the matcher, the regex can be exponential. This happens because you can backtrack arbitrarily from [A-Za-z_0-9]* into [A-Za-z_] and ironically it also causes the regex not to work as intended; for example "catch(" can match the complex part of the regex (e.g. the first repetition can be "c" and the second can be "atch".

We can make it faster and more correct at the expense of additional complication.

Starting from:

^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\([^;]*)$

we have to:

1) move [ \t] at the end of the repeated subexpression so that it removes the need for the [ \t] after

^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]*){2,}\([^;]*)$

2) make sure that at least one space/tab is eaten on all but the last occurrence of the repeated subexpression. To this end the LHS of {2,} is duplicated, once with [ \t]+ and once with [ \t]*. The repetition itself becomes a + since the last occurrence is now separately handled:

^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*
[ \t]*\([^;]*)$

Paolo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]