Re: Abrt (was Re: Most buggy packages)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you Dave!
That's exactly the kind of ideas I was looking for.
Just a short summary what we can do on the server (now) to get this brainstorm going:

- it has all the rpm debuginfo packages, so getting the symbol names or lines is not a problem (actually we do that even now)

- it can extract backtrace from userspace coredumps
  - and Fedora users are sending them...
- it can extract backtrace from kernel coredumps
  - actually never seen any Fedora user to send a kernel core
- it's not a problem to run some custom scripts during the analysis

- so far it takes the mapping component->owner from the Fedora pkgdb, the bigger plan is to be more distro agnostic, so we're not against using other data for component->owner mapping

- we have all the backtraces from all the crashes processed by the server, so we can do a lot of datamining (deduplication, finding the common component in different crashes, ...)

To wrap it up: All of these ideas bellow are doable, but not without your help (so get ready for some emails from us ;)). Almost every package needs some special handling and we can't know them all, so it's up to maintainers and developers to let us know what kind of information they need and how to get them. I can't promise it will be implemented over night, but if you shout loud enough...

One thing we're struggling with now is the normalization of stacktraces which means deciding which functions are important and which are not. e.g. for kernel there are stacktraces with a lot of warn_* functions and only a few functions are different and our logic detects this as a duplicate because the stacktraces are very similar. We're dealing with this problem, but it's very slow process because to make such decision you need to know the specific program and we would appreciate any help with this matter.

Regards,
Jirka

On 02/20/2013 02:09 AM, Dave Jones wrote:
On Tue, Feb 19, 2013 at 10:10:38PM +0100, Jiri Moskovcak wrote:

  > >>So if you want to hack this into a tool for use on kernel bugs, go for
  > >>it.
  > >...and please integrate with abrt! Let's have it all working together :)
  >
  > - I am all for it, the abrt server is exactly the place where these
  > kind of things should be

What I have in mind is the cases where some human interaction is still necessary.

Adding heuristics on the server side for certain cases would help us, but
there are still a bunch of common operations we do that require a human
to make a judgment call before we make a change.

But, pursuing the server-side solution, here are some things that we'd find useful
that *could* be automated.

- Unlike most packages, we have individual maintainers for subcomponents
   (this is where our bugzilla implementation sucks, because we can't file
    by subcomponent).  So when we get bugs against certain drivers,
    or filesystems etc, we reassign to those developers who signed up to work
    on those.
   This probably counts for a significant percentage of our interactions with
   bugzilla.  I'm not sure what kind of heuristics you'd need to add to automate
   assigning to the right person.  Maybe you can pull the symbol from the IP,
   translate that to a filename, and have a database of wildcards so you can do
   things like..
    drivers/net/wireless/* -> linville@
    fs/btrfs/* -> zab@
    etc..

   Because it's not always easy from a report to tell what component is responsible,
   sometimes parsing the Summary is necessary, which is the sort of thing
   I meant by 'needs human to make a judgment call'.  But if we can automate
   the majority of the cases, it would still help a lot.

- Similar thing as previous, but all graphics bugs get reassigned by us
   immediately to xorg-x11-drv-* because those guys deal with both the X and
   kernel modesetting/dri code. So any trace with 'i915', 'radeon' etc
   can probably be auto-reassigned.

- When we get 'general protection fault' bugs, it's useful to run the Code:
   line of the oops through scripts/decodecode (from a kernel tree).
   This disassembly will allow us to see what instruction caused the GPF.
   (Note: *just* general protection faults, not every trace.  Also, we
    only really need the faulting instruction, not the whole disassembly).
   Bonus points if it can suck the relevant data out of the debuginfo rpms
   to map the code line to C code.

- Extrapolating from the above, when we see certain register values in those
   bugs, they usually hint at the cause of a bug. For example 0x6b6b6b6b is
   SLAB_POISON, and usually means we tried to use memory after it was freed.
   Adding a comment to point this out speeds up analysis.

- Getting trickier..  We see a *lot* of flaky hardware, where we tried to
   dereference an address which had a single bit flip in memory.
   If the server side had some smarts so it knew what 'good' addresses looked like,
   it could detect the single bit-flip case, and guide the user to run
   memtest86 will save us a round-trip.

That's all I have right now, but there are probably a bunch of other
common operations we do which could be automated.

	Dave


--
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]
  Powered by Linux