RE: [RFC] Megaraid update, submission

Andre Hedrick <andre@xxxxxxxxxxxxx> · Tue, 16 May 2006 13:47:15 -0700 (PDT)

Warning OOPS in message, ignore if you hate reading pasted OOPS's

Seokmann,

So there should be no (sane) heroic attempts to recover the card state?
Please look and see the path is only retried and follows the original
operational path which resulted in setting the 'raid_dev->hw_error' flag.
If I am reading the code correctly, the *->quiescent flag controls command
submission to the card.  Thus all commands submitted to the firmware are
owned by the card, and should be allowed to complete the IO's regardless?
With as many as 20 requests outstanding (max I have seen to date) and
termiation of the transactions surely blows apart any filesystem, as I
have had filesystems and in several cases attached arrays just vaporize if
forced to reboot when 'hw_error' is set.

So since the pci_master_abort for the card is being rejected ...

Lets move on to the list management issues where timeouts on ioctl calls
have produced NULL pointers when one performs an add v/s move to transfer
ownership of a given scb between pools.

Fixing the list management may mean the pci_master_abort is not needed.

The NULL pointer:

Mar 29 00:09:53 5000 kernel: megaraid: aborting-464723 cmd=2a <c=1 t=0 l=0>
Mar 29 00:09:53 5000 kernel: megaraid abort: 464723:40[255:0], fw owner
Mar 29 00:09:53 5000 kernel: megaraid: aborting-464744 cmd=2a <c=1 t=0 l=0>
Mar 29 00:09:53 5000 kernel: megaraid abort: 464744:12[255:0], fw owner
Mar 29 00:09:53 5000 kernel: megaraid: aborting-464745 cmd=2a <c=1 t=0 l=0>
Mar 29 00:09:53 5000 kernel: megaraid abort: 464745:23[255:0], fw owner
Mar 29 00:09:53 5000 kernel: megaraid: aborting-464746 cmd=2a <c=1 t=0 l=0>
Mar 29 00:09:53 5000 kernel: megaraid abort: 464746:0[255:0], fw owner
Mar 29 00:09:53 5000 kernel: megaraid: aborting-464747 cmd=2a <c=1 t=0 l=0>
Mar 29 00:09:53 5000 kernel: megaraid abort: 464747[255:0], driver owner 
Mar 29 00:09:53 5000 kernel: megaraid: reseting the host...
Mar 29 00:09:53 5000 kernel: megaraid: 464723:128[65535:65535], reset from pending list
Mar 29 00:09:53 5000 kernel: megaraid: 4 outstanding commands. Max wait 180 sec
Mar 29 00:09:53 5000 kernel: megaraid mbox: Wait for 4 commands to complete:180
...
Mar 29 00:11:54 5000 kernel: megaraid mbox: Wait for 4 commands to complete:60
Mar 29 00:11:59 5000 kernel: megaraid mbox: Wait for 4 commands to complete:55
Mar 29 00:12:04 5000 kernel: megaraid mbox: Wait for 4 commands to complete:50
Mar 29 00:12:08 5000 kernel: megaraid mbox: reset sequence completed sucessfully
Mar 29 00:12:08 5000 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Mar 29 00:12:08 5000 kernel:  printing eip:
Mar 29 00:12:08 5000 kernel: f881f739
Mar 29 00:12:08 5000 kernel: *pde = 00000000
Mar 29 00:12:08 5000 kernel: Oops: 0002 [#1]
Mar 29 00:12:08 5000 kernel: SMP
Mar 29 00:12:08 5000 kernel: Modules linked in: xfs md5 ipv6 af_packet button thermal processor fan ac battery tsdev joydev
evdev usbkbd usbhid e1000 intel_agp agpgart ehci_hcd uhci_hcd usbcore rtc ext3 jbd sd_mod megaraid_mbox megaraid_mm ata_piix  libata scsi_mod
Mar 29 00:12:08 5000 kernel: CPU:    0
Mar 29 00:12:08 5000 kernel: EIP:    0060:[pg0+943802169/1069495296] Tainted: P      VLI
Mar 29 00:12:08 5000 kernel: EIP:    0060:[<f881f739>]    Tainted: P VLI
Mar 29 00:12:08 5000 kernel: EFLAGS: 00010046   (2.6.10)
Mar 29 00:12:08 5000 kernel: EIP is at megaraid_mbox_build_cmd+0x979/0xce0 [megaraid_mbox]
Mar 29 00:12:08 5000 kernel: eax: 00000000   ebx: 00000000   ecx: 0000000d edx: 79473000
Mar 29 00:12:08 5000 kernel: esi: c238f780   edi: c23af800   ebp: f7491f10 esp: f7491e98
Mar 29 00:12:09 5000 kernel: ds: 007b   es: 007b   ss: 0068
Mar 29 00:12:09 5000 kernel: Process scsi_eh_1 (pid: 885, threadinfo=f7490000 task=f7dde020)
Mar 29 00:12:09 5000 kernel: Stack: c23e3c00 f7de3000 f7491ebc f66fc2a0 c23e3c00 0000000d c226a42c f7436038
Mar 29 00:12:09 5000 kernel:        f7436030 f7491ee8 c23b1010 f7491ed0 011d2df4 c226aa34 c226aa2c c226a42c
Mar 29 00:12:09 5000 kernel:        00000000 000000ff c2268000 6e616373 676e696e 00000000 00000086 70696b73
Mar 29 00:12:09 5000 kernel: Call Trace:
Mar 29 00:12:09 5000 kernel:  [show_stack+171/192] show_stack+0xab/0xc0
Mar 29 00:12:09 5000 kernel:  [<c0103e9b>] show_stack+0xab/0xc0
Mar 29 00:12:09 5000 kernel:  [show_registers+351/464] show_registers+0x15f/0x1d0
Mar 29 00:12:09 5000 kernel:  [<c010402f>] show_registers+0x15f/0x1d0
Mar 29 00:12:09 5000 kernel:  [die+244/400] die+0xf4/0x190
Mar 29 00:12:09 5000 kernel:  [<c0104244>] die+0xf4/0x190
Mar 29 00:12:09 5000 kernel:  [do_page_fault+1172/1715] do_page_fault+0x494/0x6b3
Mar 29 00:12:09 5000 kernel:  [<c0117394>] do_page_fault+0x494/0x6b3
Mar 29 00:12:09 5000 kernel:  [error_code+43/48] error_code+0x2b/0x30
Mar 29 00:12:09 5000 kernel:  [<c0103aeb>] error_code+0x2b/0x30
Mar 29 00:12:09 5000 kernel:  [pg0+943799680/1069495296] megaraid_queue_command+0x50/0x90 [megaraid_mbox]
Mar 29 00:12:09 5000 kernel:  [<f881ed80>] megaraid_queue_command+0x50/0x90 [megaraid_mbox]
Mar 29 00:12:09 5000 kernel:  [pg0+943941731/1069495296] scsi_dispatch_cmd+0x173/0x290 [scsi_mod]
Mar 29 00:12:09 5000 kernel:  [<f8841863>] scsi_dispatch_cmd+0x173/0x290 [scsi_mod]
Mar 29 00:12:09 5000 kernel:  [pg0+943966809/1069495296] scsi_request_fn+0x1e9/0x430 [scsi_mod]
Mar 29 00:12:09 5000 kernel:  [blk_run_queue+42/64] blk_run_queue+0x2a/0x40
Mar 29 00:12:09 5000 kernel:  [<c023aeaa>] blk_run_queue+0x2a/0x40
Mar 29 00:12:09 5000 kernel:  [pg0+943963243/1069495296] scsi_run_host_queues+0x2b/0x50 [scsi_mod]
Mar 29 00:12:09 5000 kernel:  [<f8846c6b>] scsi_run_host_queues+0x2b/0x50 [scsi_mod]
Mar 29 00:12:09 5000 kernel:  [pg0+943960213/1069495296] scsi_error_handler+0x85/0x170 [scsi_mod]
Mar 29 00:12:09 5000 kernel:  [<f8846095>] scsi_error_handler+0x85/0x170 [scsi_mod]
Mar 29 00:12:09 5000 kernel:  [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
Mar 29 00:12:09 5000 kernel:  [<c01012d5>] kernel_thread_helper+0x5/0x10
Mar 29 00:12:09 5000 kernel: Code: 2c 82 f8 c7 47 20 01 00 00 00 8b 4d 9c 85 c9 74 39 8b 4d 9c 31 db 8d b6 00 00 00 00 8d bf 00 00 00 00 8b 55 a0 8b 42 10 8b 56 08 <89> 14 18 31 d2 89 54 18 04 8b 45 a0 8b 50 10 8b 46 0c 83 c6 10
Mar 29 00:14:23 5000 kernel:  <4>megaraid cmm: ioctl timed out
Mar 29 00:14:23 5000 kernel: megaraid cmm: controller cannot accept cmds
due to earlier errors
Mar 29 00:14:24 5000 last message repeated 3 times
...
until reboot

I know everyone will rant about ... there is a taint, I just do not
have immediate access to the logs (which) do exist without the taint
marker set.

I will post the patch on kernel.org and can be adopted or dumped.
The posting to the list was to follow the patch submission rules.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Tue, 16 May 2006, Ju, Seokmann wrote:

> Hi,
> 
> I cannot agree on the changes in the patch for following reasons.
> 
> On Tuesday, May 16, 2006 1:44 PM, Andre Hedrick wrote:
> > Random (hard to reproduce, without a noise injection into the SATA
> > connector or cable) hardware error states which locks the 
> > card and in the
> > majority of the cases caused the array to be lost.  If the 
> > array was not
> > lost then a drive was failed but one could not remove/replace w/ a new
> > drive.  Thus adding in a pci_master_abort test and clear 
> > function proved
> > to allow recovery in all cases where the card shutdown 
> > communication to
> > the host.  This may not address all cases; however, clearly this is a
> > missing part of the driver base when entry to eh_scsi_* begins.
> If 'raid_dev->hw_error' is non-zero, this means that the controller has gone bad and will (and should not to avoid further memory corruption) not be able to recoverd unless reboot.
> The overall issue described here already taken care by the patch that I've submitted.
> The patch has been accepted and should be available on 2.6.17-rc1-mm3 as specified in Andrew Morton's email.
> > The compond issue in the failed recovery resulted in a deref 
> > NULL pointer
> > in the various list_head calls.  After change the individual 
> > list_add to
> > list_move and such, the NULL point issue has never shown up 
> > in the past 6
> > weeks of heavy testing.
> I'm not sure how this changes help for the issue. Furthermore, I'm not sure what is _the NULL point issue_ refering to. If you see the issue with driver available on 2.6.17-rc1-mm3, please let me know.
> Following link will leads you to further details of the patch.
> http://www.kernel.org/git/?p=linux/kernel/git/jejb/scsi-rc-fixes-2.6.git;a=commit;h=c005fb4fb2d23ba29ad21dee5042b2f8451ca8ba
> 
> Thank you,
> 
> Seokmann
> 
> > -----Original Message-----
> > From: Andre Hedrick [mailto:andre@xxxxxxxxxxxxx] 
> > Sent: Tuesday, May 16, 2006 1:44 PM
> > To: linux-scsi@xxxxxxxxxxxxxxx; Ju, Seokmann; Andrew Morton
> > Cc: James Bottomley; Christoph Hellwig; Mukker, Atul
> > Subject: [RFC] Megaraid update, submission
> > 
> > 
> > Linux-scsi, et al.
> > 
> > The follow patch address two major issues found under 
> > extensive testing.
> > 
> > While pounding data io down the card and performing large 
> > scale queries to
> > the controller about device state and function parameters, 
> > the following
> > were discovered.
> > 
> > Random (hard to reproduce, without a noise injection into the SATA
> > connector or cable) hardware error states which locks the 
> > card and in the
> > majority of the cases caused the array to be lost.  If the 
> > array was not
> > lost then a drive was failed but one could not remove/replace w/ a new
> > drive.  Thus adding in a pci_master_abort test and clear 
> > function proved
> > to allow recovery in all cases where the card shutdown 
> > communication to
> > the host.  This may not address all cases; however, clearly this is a
> > missing part of the driver base when entry to eh_scsi_* begins.
> > 
> > The compond issue in the failed recovery resulted in a deref 
> > NULL pointer
> > in the various list_head calls.  After change the individual 
> > list_add to
> > list_move and such, the NULL point issue has never shown up 
> > in the past 6
> > weeks of heavy testing.
> > 
> > In all cases in the past, the baseline for error was 6:1.  
> > Meaning either
> > one system in six failed and/or one in six test/stress runs 
> > failed.  With
> > the attached changes, there have been zero failures in the past three
> > weeks.  This sound great, but I wish it would fail to allow some
> > statistics of improved error handling.
> > 
> > Please note the changes to SAS are minor and not tested, but 
> > seem correct
> > for the entire directory code base.  SAS shares the CMM core 
> > with MBOX,
> > thus the rational for changes to SAS.
> > 
> > Please comment and provide suggestions.
> > 
> > Cheers,
> > 
> > Andre Hedrick
> > LAD Storage Consulting Group
> > 
> > 
> > 
> > 
> -
> : send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html