On Mon, Nov 06 2017, Jeff King jotted: > On Mon, Nov 06, 2017 at 12:50:45PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> Some replies to the thread in general, didn't want to spread this out >> into different replies. >> >> * Yes this sucks. >> >> * Just emitting a warning without an appropriate exit code would suck >> more, would break batch jobs & whatnot that expcept certain results >> from grep. > > That was my first thought, too, but something that does: > > git grep foo && echo found > > would behave basically the same. Do you mean here scripts that actually > do: > > git grep foo > case "$?" in > 0) echo found ;; > 1) echo not found ;; > *) echo wtf? ;; > esac > > I agree it would be nice to at least have _some_ way to distinguish > between those final two cases. Maybe we should emit a different exit code, but I just had in mind that we could continue but the same non-zero as when the grep fails now, but with an additional warning. > Though something like "git log --grep" is even more complicated. We > perhaps _would_ want to distinguish between: > > - match (in which case we print the commit) > > - no match (in which case we do not) > > - error (in which case we do not print, but also mark the exit code as > failing) > >> * As you point out it would be nice to print out the file name we >> didn't match on, we'd need to pass the grep_source struct down >> further, it goes as far as grep_source_1 but stops there and isn't >> passed to e.g. look_ahead(), which calls patmatch() which calls the >> engine-specific matcher and would need to report the error. We could >> just do this, would slow down things a bit (probably trivally) but we >> could emit better error messages in genreal. > > I'm not sure if the grep_source has enough information for all cases. > E.g., if you hit an error while grepping in commit headers, you'd > probably want to mention the oid of the commit. There's an "identifier" > field in the grep_source, but it's opaque. > > The caller may also want to do more things than just print an error > (like the exit code adjustment I mentioned above). Which implies to me > we should be passing the error information up, not trying to bring the > context down. Indeed, I was just thinking of the case where we're grepping a file in the tree, not when the engine is used for --grep et al. >> * You can adjust these limts in PCRE in Git, although it doesn't help >> in this case, you just add (*LIMIT_MATCH=NUM) or >> (*LIMIT_RECURSION=NUM) (or both) to the start of the pattern. See >> pcresyntax(3) or pcre2syntax(3) man pages depending on what version >> you have installed. > > I saw that in the pcre manual, but I had the impression you can't > actually raise the limits above the defaults with that, only lower them. You can, you just can't set them to anything less conservative than limits already set via the C API, but we don't set any of those. I.e. PCRE allows you to say expose a regex field in a web form (as we do with gitweb) with really conservative settings that can't be overriden via a (*LIMIT) set in the pattern. But since we don't use that C API PCRE runs in a mode where the user gets set whatever limit they want in the pattern (or other pattern-altering switch), which makes sense for interactive git-grep use. >> * While regexec() won't return an error its version of dealing with >> this is (at least under glibc) to balloon CPU/memory use until the >> OOMkiller kills git (although not on this particular pattern). > > So in a sense our current behavior with pcre is the same. We just have > to provoke the death ourselves. ;) Indeed :)