On Sat, 2012-03-24 at 11:58 -0600, Jonathan Corbet wrote: > Here's a strange pathology that just bit me for the first time in a while, > though I've seen it before. I'm not sure where to file a bug on this > one... There's several levels of "X locked up" pathology, let's see if I can shed some light here. (For bonus points, someone who wanted to add this kind of info to the wiki would be Way Cool.) > In short: I'll be working away, minding my own business, when the desktop > goes completely dead - no response to any key or mouse events. That said, > the X server is still running; the pointer still moves with the mouse. I > can also switch to another virtual console with alt-ctrl-Fn. Sometimes > things start working again after some time (measured in minutes); > sometimes I lose patience and start over. Today I went and made lunch and > it never came back. The pointer position (but not image) updates during a SIGIO handler if you have hardware cursors enabled [1]. How do you know if you have hardware cursors? Short answer is, you do, unless you're running a dumb driver like vesa/fbdev/modesetting. So, class 1 lockup here is "I can't move the cursor", and boy are you in trouble. For KMS drivers this usually means X is waiting on a blocking DRM ioctl; ps will show X in D state, and /proc/$(pidof Xorg)/wchan will show you somewhere in ioctl land. This is always a video driver bug, and you will typically see something in dmesg when this happens. Don't bother trying to get an xserver backtrace here, ptrace can't attach to D-state processes. Class 2 lockup is "I can move the cursor, but the image never changes", as in, if you mouse over a text entry field it doesn't change to the vertical bar, or over a resize grip it doesn't change to a resize indicator. Here, the X server is stuck somewhere away from the main loop, but at least isn't stuck in the kernel. gdb on X will work, and will probably tell you where you're stuck. This class is usually a userspace bug, could be either the driver or the server. Class 3 lockup is "I can move the cursor and it behaves normally, but I can't type". In this scenario X _is_ successfully going around its main loop. If you can VT switch, this is you; VT switch processing happens while draining the event queue, which is driven off the main loop. This scenario has an outside chance of being an xserver bug, but typically this is the server dutifully doing what clients have told it to do: something takes a grab, and then deadlocks. Sorry about X11, we keep trying to get rid of it for a reason. Class 3 here one could debug more readily if you had some of the debugging key combos wired up in XKB: http://cgit.freedesktop.org/xorg/xserver/commit/?id=7d2543a3cb3089241982ce4f8984fd723d5312a1 Sadly gnome does not yet have UI for this, and I don't remember how to drive setxkbmap to add them. Note that the Ungrab and CloseGrab combos allow you to defeat screensaver locking - ie, they are security holes - which is why they're not enabled by default. You don't want to use them anyway if you're debugging, you want PrintGrabs so you can then go inspect the grabbing process to see why it's deadlocked. > I've tried killing off applications to see if somebody has some sort of > all-inclusive grab, but I can't find the right one if that's the case. I > can kill something like Firefox and verify that the process is gone, but > the Firefox window remains on-screen when I return to X. This is significant. It means the compositor isn't repainting. So either: a) the compositor isn't the client with the stuck grab, b) the compositor's internal grab logic is broken [1] - Why position but not image? Because on most hardware position is just one register to poke, but image updates require an image upload, which isn't safe to do if the driver is in the middle of some other accelerated rendering. Why only for hardware cursor? Because software cursor rendering only caches the pixels behind the cursor on motion, which means you could race with normal rendering. Both of these you could fix if you were willing to take much more of a mutex overhead than you're probably okay with. - ajax
Attachment:
signature.asc
Description: This is a digitally signed message part
-- test mailing list test@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe: https://admin.fedoraproject.org/mailman/listinfo/test