Re: [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 24, 2017 at 6:42 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:
>
>> Rather, it's just to make the code easier to reason about. It's
>> confusing to debug this under threading & non-threading when the
>> threading codepaths redundantly compile a pattern which is never used.
>>
>> The reason the patterns are recompiled is as a side-effect of
>> duplicating the whole grep_opt structure, which is not thread safe,
>> writable, and munged during execution. The grep_opt structure then
>> points to the grep_pat structure where pattern or patterns are stored.
>>
>> I looked into e.g. splitting the API into some "do & alloc threadsafe
>> stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the
>> execution time of grep_opt_dup() & pattern compilation is trivial
>> compared to actually executing the grep, so there was no point. Even
>> with the more expensive JIT changes to follow the most expensive PCRE
>> patterns take something like 0.0X milliseconds to compile at most[1].
>
> OK.
>
>> The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach
>> --debug option to dump the parse tree", 2012-09-13) still works
>> properly with this change. It only emits debugging info during pattern
>> compilation, which is now dumped by the pattern compiled just before
>> the first thread is started.
>
> When opt is passed to run(), opt->debug is still true for the first
> worker thread.  As long as opt->debug never makes difference after
> compile_grep_patterns(opt) returns, I think the change in this patch
> safe.

Right, the --debug feature only impacts pattern compilation.

> I do not know if we want to rely on it, but we can explain it
> away by saying "we'll only debug the runtime behaviour for the first
> worker only", or something, so it is not a big deal either way.

I think it's a pointless distraction to start speculating in this
commit message what we're going to do with --debug it if it ever
starts emitting some debugging information at pattern execution time.

As an aside, I'd very much like to remove both --debug and the
--and/--or/--all-match, gives some very rough edges in the UI and how
easy it is to make that feature error or segfault, I suspect you might
be the only one using it.

There are pattern matching optimizations I'd like to do that are much
more of a pain with that feature around. It's easy to AND multiple
regexes together into one match via -e, but when you have to deal with
negation and arbitrarily complex chained & parenthesized  AND/OR you
end up having to run your custom state machine on every line with
multiple regex matches per line.

The system grep doesn't have this feature, and people seem to do
without it. The motivation for it isn't explained in commit 79d3696cfb
("git-grep: boolean expression on pattern matching.", 2006-06-30), but
I suspect it's a hack around not being able to do "git grep ... | git
grep -v ...", which is how you'd do "I'd like to match this, but not
that" with the system grep.

Just supporting that would be much easier than supporting the and/or
matching machinery.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]