kernel watchdog: EIP: [<f85d42fa>] handle_stripe+0x24b/0x18d7 [raid456] SS:ESP 0068:ef189e54

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Howdy,

I had swraid 5 crash on my server (3.1.0).

I cannot reproduce this, and I know I don't have the very latest kernel, but
the report might be useful, so here it is:

I removed /dev/sde without setting the drive faulty first.
Because I wasn't using the array, swraid didn't notice.

When I tried to do mdadm --set-faulty, I couldn't quite because my /dev/sde1
device was gone.
So, I figured I'd just access the array and let swraid figure out the device
was gone.

When I did so, this is what happened (captured on serial console):

Did kernel watchdog trigger too quickly?
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0                                       

Thanks,
Marc

sd 8:2:0:0: rejecting I/O to offline device
end_request: I/O error, dev sde, sector 2656224
Buffer I/O error on device sde, logical block 332028
Buffer I/O error on device sde, logical block 332029
Buffer I/O error on device sde, logical block 332030
Buffer I/O error on device sde, logical block 332031
Buffer I/O error on device sde, logical block 332032
Buffer I/O error on device sde, logical block 332033
Buffer I/O error on device sde, logical block 332034
Buffer I/O error on device sde, logical block 332035
sd 8:2:0:0: rejecting I/O to offline device
ata9.02: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf
ata9.02: SError: { PHYRdyChg CommWake DevExch }
ata9.00: revalidation failed (errno=-5)
ata9.03: revalidation failed (errno=-5)
md/raid:md5: Disk failure on sde1, disabling device.
md/raid:md5: Operation continuing on 4 devices.
BUG: unable to handle kernel NULL pointer dereference at 00000070
IP: [<f85d42fa>] handle_stripe+0x24b/0x18d7 [raid456]
*pdpt = 0000000012dd3001 *pde = 0000000000000000 
Oops: 0000 [#1] SMP 
Modules linked in: ppdev lp tun autofs4 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx sata_mv kl5kusb105 ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipt_REJECT xt_state xt_tcpudp ipt_LOG iptable_mangle iptable_filter ipv6 deflate zlib_deflate ctr twofish_generic twofish_i586 twofish_common camellia serpent cast5 des_generic cryptd aes_i586 aes_generic xcbc rmd160 sha512_generic sha256_generic crypto_null af_key isofs fuse blowfish cbc dm_crypt dm_mirror dm_region_hash dm_log lm85 hwmon_vid dm_snapshot dm_mod iptable_nat ip_tables nf_conntrack_ftp ipt_MASQUERADE nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 x_tables nf_conntrack sg st snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_cmipci snd_opl3_lib snd_ens1371 snd_hwdep gameport snd_mpu401_uart snd_seq_midi snd_rawmidi snd_pcm_oss snd_ac97_codec ac97_bus snd_mixer_oss snd_pcm eeepc_wmi asus_wmi snd_seq_dummy rfkill snd_seq_oss snd_seq_midi_event snd_seq video pl2303 ati_remote usbserial pci_hotplug backlight snd_timer snd_seq_device wmi pcspkr processor parport_pc thermal_sys r8169 hwmon parport evdev button xhci_hcd intel_agp ehci_hcd sata_sil24 intel_gtt agpgart snd rtc_cmos i2c_i801 tpm_tis usbcore soundcore snd_page_alloc [last unloaded: kl5kusb105]

Pid: 6112, comm: md5_raid5 Not tainted 3.1.0-core2-volpreempt-noide-hm64-20111109 #1 System manufacturer System Product Name/P8H67-M PRO
EIP: 0060:[<f85d42fa>] EFLAGS: 00010002 CPU: 2
EIP is at handle_stripe+0x24b/0x18d7 [raid456]
EAX: 00008301 EBX: eed48ccc ECX: f0e0b128 EDX: 00008301
ESI: 00000000 EDI: eed48aa0 EBP: ef189f18 ESP: ef189e54
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process md5_raid5 (pid: 6112, ti=ef188000 task=f1888c80 task.ti=ef188000)
Stack:
 f59eac40 b07a6112 c06018e4 000454a9 c01f286f eed48ac8 f146a2a0 00008c3b
 ef189e88 00000010 ef6e2ab0 f0e0b000 ef189ea4 00000005 00000004 f0e0b000
 00000000 00000000 00000000 00000000 00000001 00000000 00000000 00000000
Call Trace:
 [<c01f286f>] ? release_sysfs_dirent+0x82/0x99
 [<f85d1573>] ? release_stripe+0x31/0x37 [raid456]
 [<f85d5d22>] raid5d+0x39c/0x3e7 [raid456]
 [<c0430a4d>] ? schedule+0x48/0x4a
 [<c0430cf2>] ? schedule_timeout+0x23/0x182
 [<c014504b>] ? finish_wait+0x44/0x49
 [<c03845ba>] md_thread+0xcf/0xe6
 [<c0144f96>] ? abort_exclusive_wait+0x61/0x61
 [<c03844eb>] ? md_register_thread+0xa6/0xa6
 [<c0144b2f>] kthread+0x62/0x67
 [<c0144acd>] ? kthread_worker_fn+0x10b/0x10b
 [<c043357e>] kernel_thread_helper+0x6/0xd
Code: 1c 83 c0 08 83 d2 00 3b 96 94 00 00 00 77 0f 72 08 3b 86 90 00 00 00 77 05 f0 80 4b 74 08 8b 43 74 f6 c4 80 74 21 f0 80 63 74 f7 <8b> 46 70 a8 02 75 10 c7 45 d0 01 00 00 00 f0 ff 86 98 00 00 00 
EIP: [<f85d42fa>] handle_stripe+0x24b/0x18d7 [raid456] SS:ESP 0068:ef189e54
CR2: 0000000000000070
---[ end trace 37fd70c74aeaa6d1 ]---
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
Pid: 0, comm: swapper Tainted: G      D     3.1.0-core2-volpreempt-noide-hm64-20111109 #1
Call Trace:
 [<c016c223>] ? touch_nmi_watchdog+0x52/0x52
 [<c042b8ba>] panic+0x4e/0x151
 [<c016c223>] ? touch_nmi_watchdog+0x52/0x52
 [<c016c294>] watchdog_overflow_callback+0x71/0x93
 [<c01782a9>] __perf_event_overflow+0x146/0x1b4
 [<c010c1c0>] ? x86_perf_event_set_period+0x19e/0x1a9
 [<c0178858>] perf_event_overflow+0x10/0x12
 [<c010eeb0>] intel_pmu_handle_irq+0x3da/0x42d
 [<c012d23b>] ? default_wake_function+0xb/0xd
 [<c010cebb>] perf_event_nmi_handler+0x3a/0x7c
 [<c01489f9>] notifier_call_chain+0x26/0x48
 [<c0148a3d>] atomic_notifier_call_chain+0xf/0x11
 [<c0148d4d>] notify_die+0x2d/0x30
 [<c0102be0>] do_nmi+0x58/0x245
 [<c0124ecc>] ? check_preempt_curr+0x27/0x62
 [<c0432df4>] nmi_stack_correct+0x2f/0x34
 [<c0432384>] ? _raw_spin_lock_irqsave+0x24/0x2d
 [<f85d155e>] release_stripe+0x1c/0x37 [raid456]
 [<f85d288d>] raid5_end_read_request+0x2cd/0x2ef [raid456]
 [<c0123085>] ? __enqueue_entity+0x63/0x69
 [<c0125954>] ? enqueue_task_fair+0x347/0x34f
 [<c01cdad7>] bio_endio+0x25/0x27
 [<c02688c4>] req_bio_endio.isra.34+0x98/0xa0
 [<c02689fc>] blk_update_request+0x130/0x2e4
 [<c0268bc4>] blk_update_bidi_request+0x14/0x51
 [<c02693c4>] blk_end_bidi_request+0x16/0x4e
 [<c0269406>] blk_end_request+0xa/0xc
 [<c031e72d>] scsi_io_completion+0x1b5/0x450
 [<c031e32a>] ? scsi_device_unbusy+0x76/0x7c
 [<c0318788>] scsi_finish_command+0xb9/0xc1
 [<c031e4f5>] scsi_softirq_done+0xd6/0xde
 [<c026cf18>] blk_done_softirq+0x54/0x61
 [<c0135941>] __do_softirq+0x78/0xfe
 [<c01358c9>] ? remote_softirq_receive+0x2e/0x2e
 <IRQ>  [<c0135b3d>] ? irq_exit+0x40/0x93
 [<c01039e4>] ? do_IRQ+0x7a/0x8e
 [<c0433570>] ? common_interrupt+0x30/0x38
 [<c01300e0>] ? copy_process+0x7d3/0xe68
 [<c02a4b8c>] ? intel_idle+0xbb/0xdf
 [<c0390cea>] ? cpuidle_idle_call+0x7f/0xb4
 [<c010157a>] ? cpu_idle+0x88/0xac
 [<c0414d28>] ? rest_init+0x58/0x5a
 [<c062c740>] ? start_kernel+0x325/0x32a
 [<c062c0a2>] ? i386_start_kernel+0xa2/0xaa
Rebooting in 20 seconds..
ACPI MEMORY or I/O RESET_REG.




-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux