Re: Wyland is a disaster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

On Mon, Jan 22, 2018 at 4:54 PM, Adam Williamson <adamwill@xxxxxxxxxxxxxxxxx> wrote:
On Mon, 2018-01-22 at 07:16 -0500, Christian Fredrik Schaller wrote:
> Sorry for responding to myself here, but I thought it could also be
> worthwhile to mention that one of our primary tools for identifying
> problems is the Fedora ABRT server. Looking at the current stats it looks
> to me like F27 is actually doing better than F26 used to in
> terms of minimizing crashers: https://goo.gl/babuJx
>
> There is always a chance ABRT is not catching the issues of course for some
> reason,

Well, see my mails from last month or so to desktop@ . There's several
problems with how abrt interacts with GNOME and Wayland; I'm not sure
to what extent these distort the figures.

First problem: abrt considers *lots* of actually-unrelated crashes to
be duplicates, because their tracebacks look similar - this happens
because glib has a special 'logging' function which actually means
(more or less) 'die intentionally, with this log message'. abrt tends
to interpret many bugs that crash along that path as dupes of each
other, even if the actual cause of the crash - whatever triggers that
special log message call - is different in each case. I've filed a
couple of variants of this at:
https://bugzilla.redhat.com/show_bug.cgi?id=1509086

Second problem: I *think* there's a similar issue with the recently-
introduced `dump_gjs_stack_on_signal_handler` path; I've found at least
some cases of apparently-unrelated bugs being marked as dupes due to
that path. Details:
https://github.com/abrt/satyr/issues/272

Third problem: abrt doesn't do a very good job of reporting any crash
that's caused by Xwayland dying. All you get is a backtrace that
basically tells you "Xwayland crashed", but no useful information about
why. Sometimes the system log extract that abrt captures happens to
shed some light on the reason, but sometimes it doesn't. Details:
https://github.com/abrt/satyr/issues/271

I did some cleanup on false dupes and things caused by these problems,
but it's necessarily incomplete, and I know more dupes have been filed
since I did the cleanup...

Regarding the characterization of issues with Wayland, there is a bit of history behind all this, and are a couple things to consider as well.

With GNOME on Wayland, gnome-shell/mutter is the display server and gnome-shell/mutter still depends on Xwayland to run [1] and cannot survive a crash in Xwayland.

Xwayland is an X server for the X11 clients but a Wayland client as well, so if gnome-shell/mutter crashes, Xwayland will lose its connection to the Wayland compositor and therefore dies as well.

So both components (gnome-shell/mutter and Xwayland) are tightly coupled and cannot survive one each other (in GNOME).

That alone makes automatically (or even manually) root causing an issue afterwards a bit of a challenge sometimes, one has first to determine which of the two components has died first and taken the other with it.

To make things slightly more challenging, Xwayland would not generate a core file on a crash, just a self-generated backtrace that could be found in journalctl, so in some case, it would be almost impossible to tell why the Wayland session crashed as no core file for Xwayland would be available (and the self-generated backtrace is rarely of much help, sadly).

So gnome-shell/mutter added “-core” to the Wayland command line (Xwayland being started automatically by gnome-shell/mutter) so that we could capture a core file every time Xwayland would crash [2].

Unfortunately, using “-core” instructs Xwayland to generate a core file each time a fatal error occurs, and losing the connection to the Wayland compositor is a fatal error for Xwayland, so now each time gnome-shell/mutter crashes, we also get a core file for Xwayland and get reports about a bug in Xwayland whereas the issue come from gnome-shell/mutter. That alone generates a lot of false positive for Xwayland, and a lot of duplicates (the backtrace usually contains “xwl_read_events()”)

The way to solve that problem is to change Xwayland to not call FatalError() when the Wayland compositor dies so that no core is generated in this case, a patch for this has already landed in the xserver master branch upstream [3].

With this, we should get a core file for “real” crashes but not when Xwayland is aborting because the Wayland compositor (gnome-shell/mutter) has crashed, hopefully that will help with a better characterization of Wayland issues in the future.

HTH,

Cheers,
Olivier
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]
  Powered by Linux