Hello,
thanks for your answer.
I understand that the PCRE's stack can get exhausted for some files, but
in such cases, git grep shall proceed with the other files, and print at
the end/stderr for which files the pattern was not applied. Such
behaviour would be more usefull than the current one.
Regards
Dilian
On 11/05/2017 03:16 AM, Jeff King wrote:
On Sun, Nov 05, 2017 at 01:06:21AM +0100, Дилян Палаузов wrote:
with git 2.14.3 linked with libpcre.so.1.2.9 when I do:
git clone https://github.com/django/django
cd django
git grep -P "if.*([^\s])+\s+and\s+\1"
django/contrib/admin/static/admin/js/vendor/select2/select2.full.min.js
the output is:
fatal: pcre_exec failed with error code -8
Code -8 is PCRE_ERROR_MATCHLIMIT. And "man pcreapi" has this to say:
The match_limit field provides a means of preventing PCRE from
using up a vast amount of resources when running patterns that
are not going to match, but which have a very large number of
possibilities in their search trees. The classic example is a
pattern that uses nested unlimited repeats.
Internally, pcre_exec() uses a function called match(), which
it calls repeatedly (sometimes recursively). The limit set by
match_limit is imposed on the number of times this function is
called during a match, which has the effect of limiting the
amount of backtracking that can take place. For patterns that
are not anchored, the count restarts from zero for each posi‐
tion in the subject string.
When pcre_exec() is called with a pattern that was successfully
studied with a JIT option, the way that the matching is exe‐
cuted is entirely different. However, there is still the pos‐
sibility of runaway matching that goes on for a very long time,
and so the match_limit value is also used in this case (but in
a different way) to limit how long the matching can continue.
The default value for the limit can be set when PCRE is built;
the default default is 10 million, which handles all but the
most extreme cases. You can override the default by suppling
pcre_exec() with a pcre_extra block in which match_limit is
set, and PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If
the limit is exceeded, pcre_exec() returns PCRE_ERROR_MATCH‐
LIMIT.
So your pattern is just really expensive and is running afoul of pcre's
backtracking limits (and it's not helped by the fact that the file is
basically one giant line).
There's no way to ask Git to specify a larger match_limit to pcre, but
you might be able to construct your pattern in a way that involves less
backtracking. It looks like you're trying to find things like "if foo
and foo"?
Should the captured term actually be "([^\s]+)" (with the "+" on the
_inside_ of the capture? Or maybe I'm just misunderstanding your goal.
-Peff