What can a "freezed" X server status be, and a HowTo to get some kernel crash dumps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,
 
A few days ago, someone on the list wrote he will configure netconsole to get some messages or debug data. But this will not work. When the box freezes, its network is dead also.

I'm still busy with my video driver(s) and kernels. I'm now able to decently and repetitively "freeze" any of 4.3.5, 4.5.0rc2 kernels, the drm/intel (based on 4.5.0rc2) I cloned last week also. I am now also able to get debug traces. Reproductible issues and some traces can be helpfull to debug codes, so I share my findings, a quick HowTo.
 
WARNING: "freezes" can kill your file systems. This procedure (reboot without sync) also, use at your own risks. Fedora and the kernels I'm using each time recovered mine, after many "freezes" or crashes and such reboots.

I recall my hardware, which is Shuttle XSV35V4 (Celeron, BIOS XS35V400.400), the video is:

# lspci -nnk | egrep -iA3 "VGA"
00:02.0 VGA compatible controller [0300]: Intel Corporation Atom Processor Z36xxx/Z37xxx Series Graphics & Display [8086:0f31] (rev 0e)
DeviceName: Onboard IGD
Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device [1297:4019]
Kernel driver in use: i915

I'm just booting on those different kernels, with either kernel debug options or not. Then I open a gnome session and play Youtube videos with Firefox. Sometime, for quicker crashes and dumps, I'm using also Chrome and Neflix. But Firefox and Youtube are enough. That's enough to kill the box, within hours and even within minutes. Play 3 videos streams/channels/playlists on two monitors, switch videos to full screen, then back to browser, and so on. This "freezes" the box, whatever the standard kernel is. It is almost repetitive, takes more or less time, some minutes, or couple of hours when videos just play, with acceleration+DRI active. It is what I get with this Shuttle XSV35V4.

I never noticed any message related to the freezes in my syslog/journal. Neitheir did I on the PC screen, neither over SSH, nor over my netconsole. But I can now dump some nice debugs with following tools and steps.
 
Set up SysReq, setup kdump, follow the common procedures. Check that you get your dumps. Here is a tuto for Fedora:
https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes

For the next steps, use two keyboards. I'm using a common Cherry USB keyboard and a Logitec wireless keyboard. I assume any pair of keyboards will do the job.
 
Why two keyboad? One will interact with X, the other will interact with your kernel. Once in gnome, you will have two active keyboards. On the USB keyboard, press Alt+SysReq+r. Now the USB keyboard is detached from X, will interacts with the kernel (check this in your /var/log/messages or journalctl -f). Detach it immediatly after X/gnome and session startup. The keyboard canot be detached anymore later, once the box is "freezed". Keep now the USB keyboard for later. Or test if it works, press Alt+SysReq+c on your detached keyboard, check if you get your core dump...

Use now your second keyboard and mouse to interact with X. Play videos with acceleration+DRI active in X. The box will die. Or maybe, play with your favorite game, and try this same process...

When the box is dead, take your previously detached USB keyboard. Press there Alt+SysReq+c. X/gnome will shutdown, and the core gets dumped (the magic key "c").


With your dumped vmcore, you will get a text file corresponding to the dmesg content (kernel messages, from the boot untill the core dump):

127.0.0.1-2016-02-10-19:20:47]# ls -al
total 109408
drwxr-xr-x   2 root root      4096 Feb 10 23:36 .
drwxr-xr-x. 15 root root      4096 Feb 10 23:37 ..
-rw-------   1 root root 111873588 Feb 10 19:20 vmcore
-rw-r--r--   1 root root     73270 Feb 10 19:20 vmcore-dmesg.txt


If you used a debug kernel, you can open and read the vmcore content, check basic and more dumped data. The dmesg content, the processor runq, the ps list, and more at the time the dump was triggered. To read the vmcore data, you will need the path to vmlinux built with your debug kernel:

# crash /mnt/kernels/linux-4.3.5/vmlinux vmcore

For one "freeze", I have noticed that 3 CPU where idle and that only Firefox remained active on my box... That status I noticed is below. Why where almost all processes idle? According to the dumped ps list, processes where still alive, but the CPU queues where empty. I'll investigate this further.

Best regards

crash> runq
CPU 0 RUNQUEUE: ffff88023fc16c80
  CURRENT: PID: 2913   TASK: ffff8800b7bd0000  COMMAND: "firefox"
  RT PRIO_ARRAY: ffff88023fc16e30
     [no tasks queued]
  CFS RB_ROOT: ffff88023fc16d20
     [no tasks queued]

CPU 1 RUNQUEUE: ffff88023fc96c80
  CURRENT: PID: 0      TASK: ffff880236270000  COMMAND: "swapper/1"
  RT PRIO_ARRAY: ffff88023fc96e30
     [no tasks queued]
  CFS RB_ROOT: ffff88023fc96d20
     [no tasks queued]

CPU 2 RUNQUEUE: ffff88023fd16c80
  CURRENT: PID: 0      TASK: ffff880236271c00  COMMAND: "swapper/2"
  RT PRIO_ARRAY: ffff88023fd16e30
     [no tasks queued]
  CFS RB_ROOT: ffff88023fd16d20
     [no tasks queued]

CPU 3 RUNQUEUE: ffff88023fd96c80
  CURRENT: PID: 0      TASK: ffff880236273800  COMMAND: "swapper/3"
  RT PRIO_ARRAY: ffff88023fd96e30
     [no tasks queued]
  CFS RB_ROOT: ffff88023fd96d20
     [no tasks queued]
 
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux