Stepan Kasal <kasal@xxxxxx> writes: > From: Johannes Schindelin <johannes.schindelin@xxxxxx> > Date: Mon, 8 Nov 2010 16:10:43 +0100 > > Incidentally, this makes grep -I respect the "binary" attribute (actually, > the "-text" attribute, but "binary" implies that). > > Since the attributes are not thread-safe, we now need to switch off > threading if -I was passed. > > Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> > Signed-off-by: Stepan Kasal <kasal@xxxxxx> > --- > > Hi, > this patch has been in msysgit for 3.5 years. > Stepan I do not think checking 'text' is the right way to do this. The attribute controls the eof normalization, and people sometimes want to keep CRLF terminated files in the repository no matter what the platform is (an example I heard is a sample protocol exchange over textual network protocol such as HTTP and SMTP), and the way to do so is to unset it. That would still let them look for patterns in and compare the changes to these paths. Looking for "Marking files as binary" in gitattributes(5) gives us a more authoritative alternative, I think. In short: - If 'diff' is Unset, or - If 'diff' is Set to X and diff.X.binary is true then the contents are not suitable for human consumption. I haven't thought things through to declare that "grep -I" would want to treat a PostScript file (which is an example in that section) as a binary file and refrain from trying to find substrings in it, but my gut feeling is that we probably should let it look inside such a file, so replacing 'text' with 'diff' in this patch and doing nothing about diff.*.binary would be the way to go. Given that the posted patch was older than 3.5 years, perhaps it needs updating to adhere the advice given in ab435611 (docs: explain diff.*.binary option, 2011-01-09). By the way, I wonder if the patch is about performance, implied by "do not bother" (to waste time looking inside), though. Is it an overall win to avoid looking inside "-diff" files, if that requires you to use only 1 core and leaving the other 7 idle? I think "the user tells us it is binary with attribute, and the user tells us not to look into binary with -I" is a lot better rationale to do this change , and that would characterise it as a correctness patch, not a performance patch. Thanks. > builtin/grep.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/builtin/grep.c b/builtin/grep.c > index 43af5b7..8073fbe 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -18,6 +18,7 @@ > #include "quote.h" > #include "dir.h" > #include "pathspec.h" > +#include "attr.h" > > static char const * const grep_usage[] = { > N_("git grep [options] [-e] <pattern> [<rev>...] [[--] <path>...]"), > @@ -163,6 +164,22 @@ static void work_done(struct work_item *w) > grep_unlock(); > } > > +static int skip_binary(struct grep_opt *opt, const char *filename) > +{ > + if ((opt->binary & GREP_BINARY_NOMATCH)) { > + static struct git_attr *attr_text; > + struct git_attr_check check; > + > + if (!attr_text) > + attr_text = git_attr("text"); > + memset(&check, 0, sizeof(check)); > + check.attr = attr_text; > + return !git_check_attr(filename, 1, &check) && > + ATTR_FALSE(check.value); > + } > + return 0; > +} > + > static void *run(void *arg) > { > int hit = 0; > @@ -173,6 +190,9 @@ static void *run(void *arg) > if (!w) > break; > > + if (skip_binary(opt, (const char *)w->source.identifier)) > + continue; > + > opt->output_priv = w; > hit |= grep_source(opt, &w->source); > grep_source_clear_data(&w->source); > @@ -379,6 +399,9 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int > continue; > if (!ce_path_match(ce, pathspec, NULL)) > continue; > + if (skip_binary(opt, ce->name)) > + continue; > + > /* > * If CE_VALID is on, we assume worktree file and its cache entry > * are identical, even if worktree file has been modified, so use > @@ -803,6 +826,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > string_list_append(&path_list, show_in_pager); > use_threads = 0; > } > + if ((opt.binary & GREP_BINARY_NOMATCH)) > + use_threads = 0; > > if (!opt.pattern_list) > die(_("no pattern given.")); > -- > 1.9.2.msysgit.0.335.gd2a461f > > -- -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html