Hey folks, a few times now I've found my system being locked up after a read error from a hard disk. It looks like the MD code that handles the read error messes up and causes a GPF. After this happens, the system becomes completely unresponsive - it responds to ping and opens TCP connections, but no data comes out. The serial console also gives no response. After rebooting, the disk in question showed a pending sector. In the most recent occurence, I found that the array was also resyncing. I'm not sure if this also happened in the earlier occurences (but since the pending sector didn't disappear in the next day, I'd expect no resync happened before). The read errors always happened during a routine check of the array. Over the last months, this happend three or four times now. In between, I've also seen some read errors that did get handled properly without a crash. I've included the kernel output from the most recent case below. If it helps, I could dig in my logs to find the earlier occurences too. I've also included some details about the raid arrays. The read error happened in sdd / md127. Gr. Matthijs Oct 5 01:08:02 tika kernel: [2383614.474387] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Oct 5 01:08:02 tika kernel: [2383614.481134] ata4.00: irq_stat 0x40000008 Oct 5 01:08:02 tika kernel: [2383614.485360] ata4.00: failed command: READ FPDMA QUEUED Oct 5 01:08:02 tika kernel: [2383614.490775] ata4.00: cmd 60/00:00:00:60:3f/04:00:04:00:00/40 tag 0 ncq 524288 in Oct 5 01:08:02 tika kernel: [2383614.490775] res 41/40:00:ae:60:3f/00:00:04:00:00/40 Emask 0x409 (media error) <F> Oct 5 01:08:02 tika kernel: [2383614.506945] ata4.00: status: { DRDY ERR } Oct 5 01:08:02 tika kernel: [2383614.511213] ata4.00: error: { UNC } Oct 5 01:08:02 tika kernel: [2383614.534796] ata4.00: configured for UDMA/133 Oct 5 01:08:02 tika kernel: [2383614.539397] sd 3:0:0:0: [sdd] Unhandled sense code Oct 5 01:08:02 tika kernel: [2383614.544489] sd 3:0:0:0: [sdd] Oct 5 01:08:02 tika kernel: [2383614.547885] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 5 01:08:02 tika kernel: [2383614.553790] sd 3:0:0:0: [sdd] Oct 5 01:08:02 tika kernel: [2383614.557417] Sense Key : Medium Error [current] [descriptor] Oct 5 01:08:02 tika kernel: [2383614.563428] Descriptor sense data with sense descriptors (in hex): Oct 5 01:08:02 tika kernel: [2383614.569937] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Oct 5 01:08:02 tika kernel: [2383614.577640] 04 3f 60 ae Oct 5 01:08:02 tika kernel: [2383614.581453] sd 3:0:0:0: [sdd] Oct 5 01:08:02 tika kernel: [2383614.584856] Add. Sense: Unrecovered read error - auto reallocate failed Oct 5 01:08:02 tika kernel: [2383614.591847] sd 3:0:0:0: [sdd] CDB: Oct 5 01:08:02 tika kernel: [2383614.595727] Read(10): 28 00 04 3f 60 00 00 04 00 00 Oct 5 01:08:02 tika kernel: [2383614.657093] general protection fault: 0000 [#1] SMP Oct 5 01:08:02 tika kernel: [2383614.660531] Modules linked in: joydev fuse btrfs raid6_pq zlib_deflate xor ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs crc32c libcrc32c reiserfs ext2 efivars xt_CLASSIFY xt_helper xt_mac xt_mark nf_conntrack_netlink tcp_diag inet_diag xt_multiport ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ip6table_mangle ip6table_filter ip6_tables ipt_REJECT xt_nat xt_tcpudp xt_connmark xt_conntrack xt_NFLOG nfnetlink_log nfnetlink xt_limit ipt_rpfilter iptable_raw veth nf_nat_ftp nf_conntrack_ftp xt_state iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables x_tables bridge stp llc ext4 crc16 jbd2 ipmi_si ipmi_devintf ipmi_msghandler loop hid_generic usbhid hid iTCO_wdt iTCO_vendor_support mperf coretemp snd_pcsp e1000e snd_pcm snd_page_alloc snd_timer lpc_ich i2c_i801 snd psmouse evdev ptp microcode i2c_core soundcore serio_raw mfd_core pps_core ehci_pci uhci_hcd ehci_hcd usbcore processor button usb_co Oct 5 01:08:02 tika kernel: mmon thermal_sys ext3 mbcache jbd dm_mod raid1 md_mod sd_mod crc_t10dif ahci libahci libata scsi_mod Oct 5 01:08:02 tika kernel: [2383614.660531] CPU: 3 PID: 232 Comm: md127_raid1 Not tainted 3.10-1-amd64 #1 Debian 3.10.3-1 Oct 5 01:08:02 tika kernel: [2383614.660531] Hardware name: Supermicro X7SPA-HF/X7SPA-HF, BIOS 1.2a 02/21/12 Oct 5 01:08:02 tika kernel: [2383614.660531] task: ffff880139b2e080 ti: ffff8801396a0000 task.ti: ffff8801396a0000 Oct 5 01:08:02 tika kernel: [2383614.660531] RIP: 0010:[<ffffffff8112fe9b>] [<ffffffff8112fe9b>] bio_copy_data+0x138/0x158 Oct 5 01:08:02 tika kernel: [2383614.660531] RSP: 0018:ffff8801396a1d20 EFLAGS: 00010286 Oct 5 01:08:02 tika kernel: [2383614.660531] RAX: ffff88005b451980 RBX: 0000000000000006 RCX: 0000000000001000 Oct 5 01:08:02 tika kernel: [2383614.660531] RDX: ffff880112510a80 RSI: db73880000000006 RDI: ffff88010319c000 Oct 5 01:08:02 tika kernel: [2383614.660531] RBP: 0000000000001000 R08: ffff880112510a00 R09: ffff88005b451800 Oct 5 01:08:02 tika kernel: [2383614.660531] R10: ffff8801396a0000 R11: 0000000000000000 R12: 0000160000000000 Oct 5 01:08:02 tika kernel: [2383614.660531] R13: ffff8801396a1fd8 R14: 6db6db6db6db6db7 R15: ffff880000000000 Oct 5 01:08:02 tika kernel: [2383614.660531] FS: 0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000 Oct 5 01:08:02 tika kernel: [2383614.660531] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 5 01:08:02 tika kernel: [2383614.660531] CR2: 00007fffc589aa80 CR3: 0000000125a83000 CR4: 00000000000007e0 Oct 5 01:08:02 tika kernel: [2383614.660531] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 5 01:08:02 tika kernel: [2383614.868070] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 5 01:08:02 tika kernel: [2383614.868070] Stack: Oct 5 01:08:02 tika kernel: [2383614.868070] ffff88010319c000 ffff88011dc91240 ffff88011dc91278 ffff88013952e780 Oct 5 01:08:02 tika kernel: [2383614.868070] ffff880112510a00 ffff8801394cb800 ffff88013952e780 ffffffffa0063785 Oct 5 01:08:02 tika kernel: [2383614.868070] ffffffff8105eaf0 ffff88013aecd040 6db6db6db6db6db7 ffff880000000000 Oct 5 01:08:02 tika kernel: [2383614.868070] Call Trace: Oct 5 01:08:02 tika kernel: [2383614.868070] [<ffffffffa0063785>] ? raid1d+0x6cd/0xacc [raid1] Oct 5 01:08:02 tika kernel: [2383614.868070] [<ffffffff8105eaf0>] ? mmdrop+0xd/0x1c Oct 5 01:08:02 tika kernel: [2383614.868070] [<ffffffffa00af0d0>] ? md_thread+0x114/0x132 [md_mod] Oct 5 01:08:02 tika kernel: [2383614.868070] [<ffffffff81057c48>] ? abort_exclusive_wait+0x79/0x79 Oct 5 01:08:02 tika kernel: [2383614.868070] [<ffffffffa00aefbc>] ? signal_pending+0x10/0x10 [md_mod] Oct 5 01:08:02 tika kernel: [2383614.868070] [<ffffffff810572f0>] ? kthread+0x7d/0x85 Oct 5 01:08:02 tika kernel: [2383614.868070] [<ffffffff810408ef>] ? do_exit+0x901/0x918 Oct 5 01:08:02 tika kernel: [2383614.868070] [<ffffffff81057273>] ? __kthread_parkme+0x59/0x59 Oct 5 01:08:02 tika kernel: [2383614.868070] [<ffffffff8138d2bc>] ? ret_from_fork+0x7c/0xb0 Oct 5 01:08:02 tika kernel: [2383614.868070] [<ffffffff81057273>] ? __kthread_parkme+0x59/0x59 Oct 5 01:08:02 tika kernel: [2383614.868070] Code: fe 03 49 0f af ce 49 0f af f6 48 c1 e1 0c 4c 01 f9 48 c1 e6 0c 48 01 cf 4c 01 fe 89 e9 48 89 3c 24 8b 78 0c 48 01 fe 48 8b 3c 24 <f3> a4 41 ff 4a 1c 41 ff 4a 1c 01 eb 41 01 eb e9 0b ff ff ff 58 Oct 5 01:08:02 tika kernel: [2383614.868070] RIP [<ffffffff8112fe9b>] bio_copy_data+0x138/0x158 Oct 5 01:08:02 tika kernel: [2383614.868070] RSP <ffff8801396a1d20> Oct 5 01:08:02 tika kernel: [2383615.003013] ---[ end trace 83bab722baa22e03 ]--- Oct 5 01:08:02 tika kernel: [2383615.007968] BUG: scheduling while atomic: md127_raid1/232/0x10000002 Oct 5 01:08:02 tika kernel: [2383615.014640] Modules linked in: joydev fuse btrfs raid6_pq zlib_deflate xor ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs crc32c libcrc32c reiserfs ext2 efivars xt_CLASSIFY xt_helper xt_mac xt_mark nf_conntrack_netlink tcp_diag inet_diag xt_multiport ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ip6table_mangle ip6table_filter ip6_tables ipt_REJECT xt_nat xt_tcpudp xt_connmark xt_conntrack xt_NFLOG nfnetlink_log nfnetlink xt_limit ipt_rpfilter iptable_raw veth nf_nat_ftp nf_conntrack_ftp xt_state iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables x_tables bridge stp llc ext4 crc16 jbd2 ipmi_si ipmi_devintf ipmi_msghandler loop hid_generic usbhid hid iTCO_wdt iTCO_vendor_support mperf coretemp snd_pcsp e1000e snd_pcm snd_page_alloc snd_timer lpc_ich i2c_i801 snd psmouse evdev ptp microcode i2c_core soundcore serio_raw mfd_core pps_core ehci_pci uhci_hcd ehci_hcd usbcore processor button usb_co Oct 5 01:08:02 tika kernel: mmon thermal_sys ext3 mbcache jbd dm_mod raid1 md_mod sd_mod crc_t10dif ahci libahci libata scsi_mod Oct 5 01:08:02 tika kernel: [2383615.119035] CPU: 3 PID: 232 Comm: md127_raid1 Tainted: G D 3.10-1-amd64 #1 Debian 3.10.3-1 Oct 5 01:08:02 tika kernel: [2383615.128462] Hardware name: Supermicro X7SPA-HF/X7SPA-HF, BIOS 1.2a 02/21/12 Oct 5 01:08:02 tika kernel: [2383615.136470] 0000000000000000 ffffffff81382c1d ffffffff81386f6e 0000000000000046 Oct 5 01:08:02 tika kernel: [2383615.144420] 0000000000014040 ffff8801396a1fd8 ffff8801396a1fd8 ffff880139b2e080 Oct 5 01:08:02 tika kernel: [2383615.152521] ffff8801396a0000 000000000000000b 0000000000000246 ffff8801396a1fd8 Oct 5 01:08:02 tika kernel: [2383615.160479] Call Trace: Oct 5 01:08:02 tika kernel: [2383615.163228] [<ffffffff81382c1d>] ? __schedule_bug+0x42/0x4f Oct 5 01:08:02 tika kernel: [2383615.169195] [<ffffffff81386f6e>] ? __schedule+0x85/0x532 Oct 5 01:08:02 tika kernel: [2383615.174893] [<ffffffff810606fc>] ? __cond_resched+0x1d/0x26 Oct 5 01:08:02 tika kernel: [2383615.180883] [<ffffffff81387464>] ? _cond_resched+0x10/0x18 Oct 5 01:08:02 tika kernel: [2383615.186732] [<ffffffff81386bd2>] ? down_read+0x9/0x19 Oct 5 01:08:02 tika kernel: [2383615.192207] [<ffffffff8104b6f4>] ? exit_signals+0x1a/0x110 Oct 5 01:08:02 tika kernel: [2383615.198080] [<ffffffff810400fb>] ? do_exit+0x10d/0x918 Oct 5 01:08:02 tika kernel: [2383615.203594] [<ffffffff81382641>] ? printk+0x4f/0x51 Oct 5 01:08:02 tika kernel: [2383615.208859] [<ffffffff81388f91>] ? oops_end+0xa9/0xae Oct 5 01:08:02 tika kernel: [2383615.214327] [<ffffffff81388568>] ? general_protection+0x28/0x30 Oct 5 01:08:02 tika kernel: [2383615.220824] [<ffffffff8112fe9b>] ? bio_copy_data+0x138/0x158 Oct 5 01:08:02 tika kernel: [2383615.226937] [<ffffffffa0063785>] ? raid1d+0x6cd/0xacc [raid1] Oct 5 01:08:02 tika kernel: [2383615.233105] [<ffffffff8105eaf0>] ? mmdrop+0xd/0x1c Oct 5 01:08:02 tika kernel: [2383615.238324] [<ffffffffa00af0d0>] ? md_thread+0x114/0x132 [md_mod] Oct 5 01:08:02 tika kernel: [2383615.244801] [<ffffffff81057c48>] ? abort_exclusive_wait+0x79/0x79 Oct 5 01:08:02 tika kernel: [2383615.251334] [<ffffffffa00aefbc>] ? signal_pending+0x10/0x10 [md_mod] Oct 5 01:08:02 tika kernel: [2383615.258046] [<ffffffff810572f0>] ? kthread+0x7d/0x85 Oct 5 01:08:02 tika kernel: [2383615.263448] [<ffffffff810408ef>] ? do_exit+0x901/0x918 Oct 5 01:08:02 tika kernel: [2383615.269032] [<ffffffff81057273>] ? __kthread_parkme+0x59/0x59 Oct 5 01:08:02 tika kernel: [2383615.275217] [<ffffffff8138d2bc>] ? ret_from_fork+0x7c/0xb0 Oct 5 01:08:02 tika kernel: [2383615.281102] [<ffffffff81057273>] ? __kthread_parkme+0x59/0x59 Oct 5 01:08:02 tika kernel: [2383615.287286] note: md127_raid1[232] exited with preempt_count 2 Oct 5 01:08:02 tika kernel: [2383615.293427] BUG: scheduling while atomic: md127_raid1/232/0x10000002 Oct 5 01:08:03 tika kernel: [2383615.300072] Modules linked in: joydev fuse btrfs raid6_pq zlib_deflate xor ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs crc32c libcrc32c reiserfs ext2 efivars xt_CLASSIFY xt_helper xt_mac xt_mark nf_conntrack_netlink tcp_diag inet_diag xt_multiport ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ip6table_mangle ip6table_filter ip6_tables ipt_REJECT xt_nat xt_tcpudp xt_connmark xt_conntrack xt_NFLOG nfnetlink_log nfnetlink xt_limit ipt_rpfilter iptable_raw veth nf_nat_ftp nf_conntrack_ftp xt_state iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables x_tables bridge stp llc ext4 crc16 jbd2 ipmi_si ipmi_devintf ipmi_msghandler loop hid_generic usbhid hid iTCO_wdt iTCO_vendor_support mperf coretemp snd_pcsp e1000e snd_pcm snd_page_alloc snd_timer lpc_ich i2c_i801 snd psmouse evdev ptp microcode i2c_core soundcore serio_raw mfd_core pps_core ehci_pci uhci_hcd ehci_hcd usbcore processor button usb_co Oct 5 01:08:03 tika kernel: mmon thermal_sys ext3 mbcache jbd dm_mod raid1 md_mod sd_mod crc_t10dif ahci libahci libata scsi_mod Oct 5 01:08:03 tika kernel: [2383615.404198] CPU: 1 PID: 232 Comm: md127_raid1 Tainted: G D W 3.10-1-amd64 #1 Debian 3.10.3-1 Oct 5 01:08:03 tika kernel: [2383615.413670] Hardware name: Supermicro X7SPA-HF/X7SPA-HF, BIOS 1.2a 02/21/12 (This is where the log ends, probably things went FUBAR at this point) matthijs@tika:~$ sudo mdadm --misc -D /dev/md126 /dev/md126: Version : 1.2 Creation Time : Thu Apr 10 19:04:30 2014 Raid Level : raid1 Array Size : 104791936 (99.94 GiB 107.31 GB) Used Dev Size : 104791936 (99.94 GiB 107.31 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Sun Oct 5 11:17:35 2014 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : tika:ssd (local to host tika) UUID : 16e28bb4:db7d5c69:81cb1b64:ae4f760d Events : 125 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 matthijs@tika:~$ sudo mdadm --misc -D /dev/md127 /dev/md127: Version : 1.2 Creation Time : Tue Sep 21 12:30:00 2010 Raid Level : raid1 Array Size : 488384536 (465.76 GiB 500.11 GB) Used Dev Size : 488384536 (465.76 GiB 500.11 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Sun Oct 5 11:12:55 2014 State : active, resyncing Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Resync Status : 16% complete Name : tika:hdd (local to host tika) UUID : f201f80b:8c1e6f0b:ccc25f2e:98fdccc6 Events : 922 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 8 49 1 active sync /dev/sdd1
Attachment:
signature.asc
Description: Digital signature