Re: [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:

> I think it's a pointless distraction to start speculating in this
> commit message what we're going to do with --debug it if it ever
> starts emitting some debugging information at pattern execution time.

OK.

> As an aside, I'd very much like to remove both --debug and the
> --and/--or/--all-match, gives some very rough edges in the UI and how
> easy it is to make that feature error or segfault, I suspect you might
> be the only one using it.

I agree that rewriting "grep -e A -e B" to "grep -e A|B" as an
optimization is an interesting possibility to look into, and I can
understand that having to support "--and" and "--not" would
make such an optimization harder to implement. "-e A --and -e B"
must become "-e A.*B|B.*A" and as you get more terms your unified
pattern will grow combinatorial, at which point you would be better
off matching N patterns and combining the result.

Ever saw a user run "ps | grep rogue | grep -v grep" to find a rogue
process to kill?  That would not work if the rogue process's command
line has a word "grep".  Because "git grep" is often run on files in
order to find the location the patterns appear in, "git grep -e
pattern | grep -v unwanted" shares the same issue--the unwanted
pattern may appear in the filename, and the downstream "grep -v" may
filter out a valid hit.  This is why "--not" exists [*1*].  I agree
that emulating it within the same "concatenate patterns into one"
optimization you are envisioning may be hard.

Attempting to optimize "--all-match" would share similar difficulty
with "--and", but your matching now must be done with the entire
buffer and not go line-by-line.  It was meant to make it possible to
say "find commits that avarab@ talks about both regex and log", i.e.

	$ git log --author=avarab@ --all-match --grep=log --grep=regex

This is not something you can emulate by piping an output of grep to
another grep.

But none of the above means you have to give up optimizing.  

You can choose not to combine them into a single pattern if certain
constructions are hard, and do only the easy ones.  If you think
that harder combinations are not used very often, the result would
be faster for many cases while not losing useful features, which is
what we want.


[Footnote]

*1* For human consumption, lack of "--not" may not hurt in the sense
    that there are workarounds (i.e. you can do without "| grep -v
    unwanted" and filter irrelevant ones by eyeballing).  But it is
    essential while scripting and trying to be precise.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]