Conrad Irwin <conrad.irwin@xxxxxxxxx> writes: >> in order to avoid uselessly spewing garbage at you while reminding you >> that the command is not showing the whole truth and you _might_ want to >> look into the elided file if you wanted to with "grep -a world hello.o". >> Compared to this, it feels that the result your patch goes too far and >> hides the truth a bit too much to my taste. Maybe it is just me. > > I used to use this approach, hooking into the "diff" attribute directly to mark > a file as binary, however that was clearly a hack. After thinking about this a bit more, I have to say I disagree that it is a hack. I agree that the issue you are trying to address is real, but I think the approach taken by the patch is flawed on three counts: - You cannot do "grep -a" equivalent once you set your "-grep" attribute; - The user has flexibility to set "diff" and "grep" independently, which is an unnecessary complication [*1*]; and - The user does not get any hint that potential hits are elided, like normal "grep" would do. I also think we can fix the above problems, while simplifying the external interface visible to the end users. So let's step back a bit and take a look at the handling of files for which you do not want to see patch output and/or you do not want to see grep hits, in a fictional but realisitic use scenario. I am building a small embedded applicance that links with a binary blob firmware, xyzzy.so, supplied by an upstream vendor, and this blob is updated from time to time. In order to keep track of what binary blob went into which product release, this binary blob is stored in the repository together with the source files. Because it is useless to view binary changes in "git log -p" output, I would naturally mark this as "binary". Now, is it realistic to expect that I might want to see "grep" hits from this binary blob? I do not think so [*2*]. When I say "xyzzy.so is a binary" in my .gitattributes file, according to the current documentation, I am saying that "do not run diff, and also do not run EOL conversion on this file" [*3*], but from the end user's point of view, it is natural that I am entitled to expect much more than that. Without having to know how many different kinds of textual operations the SCM I happen to be using supports, I am telling it that I do not want to see _any_ textual operations done on it. If a new version of "git grep" learns to honor the attributes system, I want my "this is binary" to be considered by the improved "git grep". If you think about it this way from the very high level description of the problem, aka, end user's point of view, it is fairly clear that tying the "binary" attribute to "git grep" to allow us to override the built-in buffer_is_binary() call you see in grep.c gives the most intuitive result, without forcing the user to do anything more than they are already doing. Because "binary" decomposes to "-diff" and "-text", we have two choices. We _could_ say "-text" should be the one that overrides buffer_is_binary() autodetection, but using "-diff" instead for that purpose opens the door for us to give our users even more intuitive and consistent experience. Suppose that this binary blob firmware came with an API manual formatted in PDF, xyzzy.pdf, also supplied by the vendor. It is also kept in the repository, but again, running textual diff between updated versions of the PDF documentation would not be very useful. I however may have a textconv filter defined for it to grab the text by running pdf2ascii. Now if my "git show --textconv xyzzy.pdf" has an output line that says a string "XYZZY API 2.0" was added to the current version, wouldn't it be natural for me to expect that "git grep --textconv 'XYZZY API' xyzzy.pdf" to find it [*4*]? As an added bonus, if you truly want to omit _any_ hit from "git grep" for your minified.js files, you can easily do so. Just define a textconv filter that yields an empty string for them, and "grep --textconv" won't produce any matches, not even "Binary file ... matches". So I am inclined to think that reusing the infrastructure we already have for the `diff` attribute as much as possible would be a better design for this new feature you are proposing. The name of the attribute `diff` is somewhat unfortunate, but the end user interface to the feature at the highest level is already called "binary" attribute (macro), and our low-level documentation can give precise meaning of the attribute while alluding to the origin of the name so that intelligent readers would understand it pretty easily (see attached patch). Besides, we already tell the users in "Marking files as binary" section to mark them with "-diff" in the attributes file if they want their contents treated as binary. [Footnotes] *1* An unused flexibility like this does nothing useful, other than forcing the user to flip two attributes always as a pair, when one should suffice. *2* The essence of my previous message was "why it is insufficient to mark them binary, i.e. uninteresting for the purpose of all textual operations not just grep but also for diff." which was unanswered. *3* This is because "binary" is defined as an attribute macro that expands to "-diff -text". *4* This also suggests that the infrastructure we already have for driving "git diff" is a good match for "git grep". The "funcname" patterns, which "git grep -p" already can borrow and use from "diff" infrastructure, is another indication that using `diff` is a good match for "git grep". Documentation/gitattributes.txt | 15 ++++++++++++--- 1 files changed, 12 insertions(+), 3 deletions(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index a85b187..63844c4 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -388,9 +388,18 @@ Generating diff text `diff` ^^^^^^ -The attribute `diff` affects how 'git' generates diffs for particular -files. It can tell git whether to generate a textual patch for the path -or to treat the path as a binary file. It can also affect what line is +The attribute `diff` affects if `git` treats the contents of file as text +or binary. Historically, `git diff` and its friends were the first to use +this attribute, hence its name, but the attribute governs textual +operations in general, including `git grep`. + +For `git diff` and its friends, differences in contents marked as text +produces a textual patch, while differences in non-text are reported only +as "Binary files differ". For `git grep`, matches in contents marked as +text are reported by showing lines that contain them, while matches in +non-text are reported only as "Binary file ... matches". + +This attribute can also affect what line is shown on the hunk header `@@ -k,l +n,m @@` line, tell git to use an external command to generate the diff, or ask git to convert binary files to a text format before generating the diff. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html