On Tue, Dec 28, 2021 at 3:53 PM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > On Tue, Dec 28 2021, Elijah Newren wrote: > > > On Tue, Dec 28, 2021 at 8:32 AM Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote: > >> > If you'd like a semi-stable grouping across similar git versions the > >> > "file/func" pair should be Good Enough for most purposes. Some functions > >> > might emit multiple errors, but you'd probably want to group them as > >> > similar enough anyway. > > > > Why would we want to group different errors? Isn't the point to > > figure out which error is being triggered the most (or which errors)? > > This sounds like it'd leave us with more investigation work to do. > > Ideally you wouldn't, i.e. the goal here is to get some approximation of > a unique ID for an error across versions. > > But unless we're going to assign something like MySQL's error ID's > manually any automatic method we pick is only going to be an > approximation. I like this way that you frame it. I agree. > So the question is whether we can have something that's good enough. The > current "fmt" feature is fragmented by i18n. That's fixable (at the cost > of quite a lot of lines changed), but would something even more succinct > be good enough? > > Which is why I suggested file/function, i.e. it'll have some > duplication, but for an error dashboard using trace2 data I'd think it's > probably good enough. > > But maybe not. I just wanted to ask about it as a quick question... I think for determining the most frequently triggered errors, fragmentation is a minor issue, so you are right to call it out. In particular, having the counts of issues separated by language might mean that when we pick the top N errors, some of those in the top N wouldn't really be in the top N if we had them correctly combined with the other translations (and we also might get duplicates within our chosen top N, since an english and a german translation of the same error are both in the top N of the fragmented counts). Pretty unlikely to be a problem in practice, though, and rather trivial to work around once we have the data collected and are looking into it. Even in the really unlikely event that I was trying to fix a "top N" problem and accidentally ended up with a "top N+2" problem, I'm still dealing with a "real error" that users are hitting. Any work I do to fix it will help people facing a real problem. In contrast, coalescing of errors to me would be a major issue. Let's say I look at the top error, as reported by file/function. But that one error is from a function that has four error paths. If I take a guess at one of those error paths and try to fix it, I might be chasing ghosts and completely wasting my time. My first step should be to go back to the drawing board and attempt to collect data about what error the user was actually hitting (a rather lengthy process, especially in attempting over a period of weeks/months to cajole users to upgrade their git versions to get the new logging) -- but that was exactly what this trace2 stuff was supposed to be doing in the first place, so the file/function approximation choice defeats the purpose of this error logging. It sounds like a deal breaker to me. My gut instinct is that I'd take nearly any level of fragmentation over the possible coalescing of separate errors. I think the fragmentation solutions probably fall under the "good enough" category. So, for example, the file/line number might be good enough. It's a lot more fragmentation than different languages, though, and it also suffers from the problem that it's hard to tell if new git versions are fixing some of the "top N" problems (because new git versions would have different line numbers and thus represent the top N problems differently, whereas the fmt-based fragmentation will at least be relatively consistent in its representation of errors across git versions). But if the fmt solution was super problematic for some other reasons, I'd gladly take file/line-number over file/function. So, of the solutions presented so far, the "fmt" feature seems to me to be the best reasonable effort approximation. Anyway, just my $0.02...