Re: PROBLEM: kernel crashes on RAID1 drive error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jens,

On Oct 21, 2004, at 9:02 AM, Jens Axboe wrote:

-97 is the release kernel, -111 is the current update kernel. And it has
those raid1 issues fixed already, at least the ones that are known. The
scsi segment issue is not, however.

Thanks. Good to know that. -111 is currently available to customers? We
may recommend that our customers use that, rather than patching -97
ourselves.

Yes it is, it's generally available through the online updates.

FWIW, I tried the -111 kernel and got a crash with my failing drive. The messages out of the kernel were:


raid1: Disk failure on sdb1, disabling device.
raid1: sdb1: rescheduling sector 176
raid1: sda1: redirecting sector 176 to another mirror
raid1: sdb1: rescheduling sector 184
raid1: sda1: redirecting sector 184 to another mirror
Oct 22 10:42:03 linux kernel: scsi0: ERROR on channel 0, id 5, lun 0, CDB: Read (10) 00 00 00 00 bf 00 01 00 00
Oct 22 10:42:03 linux kernel: Info fld=0xf3, Current sdb: sense key Medium Error
Oct 22 10:42:03 linux kernel: Additional sense: Unrecovered read error
Oct 22 10:42:03 linux kernel: end_request: I/O error, dev sdb, sector 240
Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
*pde = 00000000
Oops: 0000 [#1]
SMP
CPU: 0
EIP: 0060:[<c01559a4>] Tainted: G U
EFLAGS: 00010286 (2.6.5-7.111-smp)
EIP is at page_address+0x14/0xc0
eax: 00000000 ebx: 00000000 ecx: d0e50ac0 edx: f782a970
esi: f7d7cd00 edi: 00000001 ebp: 00000008 esp: f7e65e90
ds: 007b es: 007b ss: 0068
Process scsi_eh_0 (pid: 220, threadinfo=f7e64000 task=f7e1acb0)
Stack: 00000000 f7d7cd00 00000001 00000008 c0249501 c0127b7a 00000001 d0e50ac0
00000000 00000e00 c0249bee c035b0f4 f7eb5e8c 000000ef 00000000 00000001
fffffffb 00000e00 00000007 f7d7cd00 f7d7cd00 f71cce00 00000000 f7def200
Call Trace:
[<c0249501>] blk_recalc_rq_sectors+0xa1/0x110
[<c0127b7a>] printk+0x18a/0x1a0
[<c0249bee>] __end_that_request_first+0x1be/0x240
[<f883fb99>] scsi_end_request+0x29/0xe0 [scsi_mod]
[<f883ff74>] scsi_io_completion+0x324/0x4c0 [scsi_mod]
[<f883a3b2>] scsi_finish_command+0x82/0xf0 [scsi_mod]
[<c0127b7a>] printk+0x18a/0x1a0
[<f883e687>] scsi_error_handler+0x987/0xed0 [scsi_mod]
[<f883dd00>] scsi_error_handler+0x0/0xed0 [scsi_mod]
[<c0107005>] kernel_thread_helper+0x5/0x10


Code: 8b 00 f6 c4 01 75 26 a1 0c fb 47 c0 29 c3 c1 fb 05 c1 e3 0c
<1>Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
f88584be
*pde = 00000000
Oops: 0002 [#2]
SMP
CPU: 0
EIP: 0060:[<f88584be>] Tainted: G U
EFLAGS: 00010046 (2.6.5-7.111-smp)
EIP is at dump_block_silence+0x1e/0xc0 [dump_blockdev]
eax: 00000000 ebx: f7d86c00 ecx: f8875810 edx: 00000000
esi: f8859740 edi: f7e65e5c ebp: 00000000 esp: f7e65d28
ds: 007b es: 007b ss: 0068
Process scsi_eh_0 (pid: 220, threadinfo=f7e64000 task=f7e1acb0)
Stack: 00000000 00000000 00000000 00000000 00000000 00000000 f8870ae9 00000000
00000000 00000000 f8870c49 00000000 00000000 00000000 f8870d05 00000000
c0358f00 00000202 f886f852 ffffffef c010aed3 00000000 c010af28 c03552c0
Call Trace:
[<f8870ae9>] dump_begin+0x59/0xd0 [dump]
[<f8870c49>] dump_execute_savedump+0x9/0x50 [dump]
[<f8870d05>] dump_generic_execute+0x75/0x80 [dump]
[<f886f852>] dump_execute+0x52/0xa0 [dump]
[<c010aed3>] die+0x133/0x1b0
[<c010af28>] die+0x188/0x1b0
[<c011dc40>] do_page_fault+0x0/0x54d
[<c011df81>] do_page_fault+0x341/0x54d
[<f88c9c20>] ahd_linux_queue_cmd_complete+0xe0/0x2a0 [aic79xx]
[<c011dc40>] do_page_fault+0x0/0x54d
[<c010a28d>] error_code+0x2d/0x40
[<c01559a4>] page_address+0x14/0xc0
[<c0249501>] blk_recalc_rq_sectors+0xa1/0x110
[<c0127b7a>] printk+0x18a/0x1a0
[<c0249bee>] __end_that_request_first+0x1be/0x240
[<f883fb99>] scsi_end_request+0x29/0xe0 [scsi_mod]
[<f883ff74>] scsi_io_completion+0x324/0x4c0 [scsi_mod]
[<f883a3b2>] scsi_finish_command+0x82/0xf0 [scsi_mod]
[<c0127b7a>] printk+0x18a/0x1a0
[<f883e687>] scsi_error_handler+0x987/0xed0 [scsi_mod]
[<f883dd00>] scsi_error_handler+0x0/0xed0 [scsi_mod]
[<c0107005>] kernel_thread_helper+0x5/0x10


Code: 86 02 84 c0 ba f0 ff ff ff 7f 0e 8b 5c 24 10 89 d0 8b 74 24
<6>LKCD dump already in progress
------------[ cut here ]------------
kernel BUG at kernel/exit.c:833!
invalid operand: 0000 [#3]
SMP
CPU: 0
EIP: 0060:[<c012a108>] Tainted: G U
EFLAGS: 00010282 (2.6.5-7.111-smp)
EIP is at do_exit+0x968/0xb60
eax: 00000001 ebx: 00000000 ecx: 00000000 edx: 00000001
esi: f7fa17c0 edi: f7e1acb0 ebp: f7fa17c0 esp: f7e65bd8
ds: 007b es: 007b ss: 0068
Process scsi_eh_0 (pid: 220, threadinfo=f7e64000 task=f7e1acb0)
Stack: 00017e5a 00000282 f7e65cf4 c0431a41 00000246 f7e1ad08 00000002 f7e1ad48
f7e65c10 00000202 00000002 f7e1ad08 f7e64000 00000002 f7e65cf4 00000002
c010af50 0000000b c034405a 00000002 00000002 f7e1acb0 c034405a 00000000
Call Trace:
[<c010af50>] do_simd_coprocessor_error+0x0/0xb0
[<c011dc40>] do_page_fault+0x0/0x54d
[<c011df81>] do_page_fault+0x341/0x54d
[<f886fdfe>] dump_lcrash_save_context+0x2e/0x60 [dump]
[<c0119fa1>] dump_send_ipi+0x11/0x20
[<f88710e4>] __dump_save_other_cpus+0xb4/0xe0 [dump]
[<f88700ce>] dump_lcrash_configure_header+0x29e/0x2c0 [dump]
[<c011dc40>] do_page_fault+0x0/0x54d
[<c010a28d>] error_code+0x2d/0x40
[<f88584be>] dump_block_silence+0x1e/0xc0 [dump_blockdev]
[<f8870ae9>] dump_begin+0x59/0xd0 [dump]
[<f8870c49>] dump_execute_savedump+0x9/0x50 [dump]
[<f8870d05>] dump_generic_execute+0x75/0x80 [dump]
[<f886f852>] dump_execute+0x52/0xa0 [dump]
[<c010aed3>] die+0x133/0x1b0
[<c010af28>] die+0x188/0x1b0
[<c011dc40>] do_page_fault+0x0/0x54d
[<c011df81>] do_page_fault+0x341/0x54d
[<f88c9c20>] ahd_linux_queue_cmd_complete+0xe0/0x2a0 [aic79xx]
[<c011dc40>] do_page_fault+0x0/0x54d
[<c010a28d>] error_code+0x2d/0x40
[<c01559a4>] page_address+0x14/0xc0
[<c0249501>] blk_recalc_rq_sectors+0xa1/0x110
[<c0127b7a>] printk+0x18a/0x1a0
[<c0249bee>] __end_that_request_first+0x1be/0x240
[<f883fb99>] scsi_end_request+0x29/0xe0 [scsi_mod]
[<f883ff74>] scsi_io_completion+0x324/0x4c0 [scsi_mod]
[<f883a3b2>] scsi_finish_command+0x82/0xf0 [scsi_mod]
[<c0127b7a>] printk+0x18a/0x1a0
[<f883e687>] scsi_error_handler+0x987/0xed0 [scsi_mod]
[<f883dd00>] scsi_error_handler+0x0/0xed0 [scsi_mod]
[<c0107005>] kernel_thread_helper+0x5/0x10


Code: 0f 0b 41 03 37 43 34 c0 eb fe 8b 6f 10 85 ed 74 ac eb 9b 8b
 <6>LKCD dump already in progress

*** everything beyond removed, because cpu 0 continued to fault over and over

--
Mark Rustad, MRustad@xxxxxxx

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux