On 5/19/21 7:34 PM, Jonathan Nieder wrote:
Hi, (Danger, jrn is wading into error handling again...) At $DAYJOB we are setting up some alerting for some bot fleets and developer workstations, using trace2 as the data source. Having trace2 has been great --- combined with gradual weekly rollouts of "next", it helps us to understand quickly when a change is creating a regression for users, which hopefully improves the quality of Git for everyone. One kind of signal we haven't been able to make good use of is error rates. The problem is that a die() call can be an indication of a. the user asked to do something that isn't sensible, and we kindly rebuked the user b. we contacted a server, and the server was not happy with our request c. the local Git repository is corrupt d. we ran out of resources (e.g., disk space) e. we encountered an internal error in handling the user's legitimate request
... For the error event that `error()` and `die()` and friends generate, I emit both the fully formatted error message and the format string. The latter, if used as a dictionary key, would let you group like events from different processes without worrying about the filename or blob id or remote name or etc. in any one particular instance. Would that be sufficient as an error classification and something that you can key off of in your post-processing ? Granted the same format message might be used in multiple places in the source, but I also provide the source filename and line number. If it turns out that all of the error events come from "usage.c" (i.e. error_builtin() or die_builtin()), then maybe we need to look at another way of wrapping those calls to pass the F/L of actual caller. I hesitated to do that because of the existing indirection tricks in usage.c WRT the `set_error_routine()` and friends. (And that assumes that the format string is a viable solution for you problem.) Jeff