Re: abrt + X Error => zillions of duplicate bug reports?

Karel Klic <kklic@xxxxxxxxxx> · Wed, 25 Nov 2009 10:20:09 +0100

Hi Adam,

please see below.

On 11/24/2009 08:15 PM, Adam Williamson wrote:
On Sun, 2009-11-22 at 19:21 +0100, Martin Sourada wrote:
So,

since I've already received 3 separate bug reports caused by BadIDChoice
X Error in subtitleeditor [1][2][3] (haven't had enough time to debug
and try to fix it yet though) by abrt, I wonder if there is any room for
duplicity detection improvement in these cases, or if we are doomed to
zillions of duplicates in rhbz? (btw. otherwise abrt is awesome, IMHO
the bugreports from abrt are much more useful than before :-)

We discussed this issue at the Bugzappers meeting today. BZ would like
to register that the high level of duplicates reported by abrt is a
significant issue for triage work. We're not sure we can sustainably
triage some major components (e.g. Firefox) if the current situation
continues.

We came up with several possible courses of action. First, we
acknowledge that abrt team is working on improving duplicate detection,
but Matej noted that this is intrinsically hard work and abrt will
likely never be able to eliminate or even come close to eliminating
duplicate reporting.

The algorithm for duplicate detection in the currently released version 
of ABRT is very rudimentary: it removes only the most obvious duplicates 
in simple programs. As far as I know it does not work for applications 
with variable number of threads (e.g. Firefox).

Fortunately now we have a new algorithm for duplicate detection which 
handles all the cases in a significantly better way. Most of the code is 
written, but it needs some testing before releasing. I guess it will 
take two weeks or so to finish it, and to make sure it works well.

An important attribute of the new algorithm is that it errs on the side 
of false duplicates. So it will much more often say some bug is a 
duplicate of another bug, even if sometimes it is not the case. It 
should make abrt bug flow sustainable, and than we can slowly improve 
the detection mechanism to be more accurate.

Second, we wondered if abrt team might be able to assist in running any
improved duplicate detection mechanisms over already-reported bugs in
Bugzilla retrospectively. We will follow up with them about that.

Third, we agreed to look at methods used in GNOME and other Bugzillas to
cope with high levels of duplicate reporting from automated tools, such
as extracting significant sections of tracebacks as bug comments to make
manual duplicate detection faster and easier.

Good idea.

Finally, we considered - and rather approved of - the proposal that's
been floated on this list (and was floating in the meeting by Will
Woods) to consider using the mechanism used by the kernel developers for
kernel oopses: instead of being reported direct to Bugzilla, these are
reported to an intermediate site (kerneloops.org) and can be promoted
from there to Bugzilla if appropriate. Will is planning to work on this
idea after finishing up some AutoQA work, and will talk to abrt team
about it and see if they are interested in helping. He would welcome
contact from anyone else who's interested in helping with that, too.

When the duplicate detection works, it would be a loss to not have the 
crashes directly in Bugzilla. I often see that the crashes reported by 
ABRT are located in the code and fixed.

If we fail to deliver better detection, then some intermediate site is 
certainly better target for thousands of duplicates than Bugzilla.

I would propose to create some intermediate site as a target for users 
who are not experienced enough to create an account in Bugzilla and to 
respond to questions, or they simply do not care. Then, it would be 
possible for them to report almost automatically, and we could get a lot 
of backtraces and support data that is currently lost. However, this 
must be thought out (security issue with backtraces).

That's all, really - I just took an action item to pass on our thoughts
about this :)

Best regards,
Karel

--
fedora-devel-list mailing list
fedora-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-devel-list