Problem still exists in kernel 3.7-rc7. During the second backup run after I booted kernel 3.7.0-rc7, once again all disk activity suddenly ceased, dmesg started to report tasks "hung for more than 120 seconds", and everything happening after that was forgotten after a reboot. Can I do anything to help hunting this down? Am 20.11.2012 01:14, schrieb Tilman Schmidt: > For the 4th time now after switching to kernel 3.6, my system became > unresponsive during the nightly Bacula backup run. It looks as if > all disk accesses are suddenly blocked: > - Desktop apps stop responding one after another, starting with > Firefox followed by other "heavy" apps, while Konsole windows > continue being usable for a while. > - "top" shows the load average steadily increasing with no process > actually consuming relevant quantities of CPU. > - I can do "dmesg > /root/dmesg.out" followed by "less /root/dmesg.out" > in a Konsole window just fine, but after the inevitable hard reset > the file /root/dmesg.out isn't there. > - The "sync" command hangs indefinitely. > - The "shutdown" command and ctrl/alt/Del emit "system going down" > broadcast messages but never get anywhere. > - Killing processes manually works for some (bacula-sd even ejects > the tape before exiting) but most remain in state D or Z. > - Eventually, all text consoles are blocked and a hardware reset is > the only remaining option. > - After the reboot, a Bacula spool file is left behind in > /var/spool/bacula, proof that the hang happened during the backup. > > This does not happen during every backup run, but frequently enough > to be annoying. (About once per week.) It never happened with kernel > 3.5. For comparison went back to kernel 3.5.7 for a week and it > never happened during that time. Last night I booted 3.6.7 and the > very next backup caused the hang again. The last kernel message that > made it to the syslog on disk was > > Nov 19 23:05:04 xenon kernel: [73877.128546] st0: Block limits 256 - > 524288 bytes. > > triggered by the start of the backup. In dmesg the next message was > > [74401.249091] INFO: task flush-253:2:1320 blocked for more than 120 > seconds. > > followed by a backtrace. I have photos of the remaining dmesg output > which I'll try to upload somewhere accessible tomorrow. > > Hardware configuration: > Intel Pentium D, Intel DQ965GF mainboard, 6 GB RAM > onboard S-ATA controller driving two 500 GB S-ATA disks > and a Pioneer DVR-216D DVD-RW drive > Adaptec 29160B Ultra160 SCSI adapter driving a > Tandberg TS400 LTO-2 tape drive > > Disk configuration: md RAID1, LVM, ext3 and ext4 volumes > > Software: Opensuse 11.4 64 bit, vanilla kernel 3.5.7 and 3.6.7, > Bacula 5.2.12 > > HTH > T. > -- Tilman Schmidt E-Mail: tilman@xxxxxxx Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
Attachment:
signature.asc
Description: OpenPGP digital signature