Re: lpfc RAID1 device panics when one device goes away

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Actually I view the configuration as identical to having two locally attached SCSI disks which are mirrored via software RAID1. The only difference being the two "drives" (LUNs) are located on a storage array on a SAN. As far as the OS is concerned the two LUNs are just two separate SCSI drives. I'm speculating that the lpfc driver does not handle or requires tuning parameters to be set to return the failed path information back up to the SCSI driver in a manner which won't cause a panic.
-Mark


Hamilton Andrew wrote:
Mark,

I may be wrong here and maybe someone out there knows better, but I don't think this will work without PowerPath. That allows your OS to treat both your HBA's as one. And it load balances across the two HBA's. Without that you have two independent connections to two LUNs and that is what is causing the panic. You need something that will treat both your connections as one connection. Even if both your HBA's can talk to both LUNs the OS is not going to fail over to the one that is working without some sort of go-between, and the kernel does not know it can talk to both LUNs via either HBA. It just knows that it had 2 connections to the raid and one of them is gone so the raid is no longer available. At least that is the way it would seem to work to me.

My 2 cents. Let me know if you find out something different though.

Drew

-----Original Message-----
From: Bruen, Mark [mailto:mbruen@xxxxxxxxxxxxxx]
Sent: Friday, January 30, 2004 8:54 AM
To: redhat-list@xxxxxxxxxx
Subject: Re: lpfc RAID1 device panics when one device goes away


No, it worked once but then on the next test panic'd again, I'll keep looking. -Mark

Hamilton Andrew wrote:
> Did that fix it? I have an EMC CX600 configured much the same way, but
> I'm using RHEL 2.1AS instead of 3.0. I'm sure there are a ton of
> differences between the two distro's.
>
> -----Original Message-----
> From: Bruen, Mark [mailto:mbruen@xxxxxxxxxxxxxx]
> Sent: Wednesday, January 28, 2004 7:09 PM
> To: redhat-list@xxxxxxxxxx
> Subject: Re: lpfc RAID1 device panics when one device goes away
>
>
> I think I have fixed this by changing the partition type of each LUN's
> (disk)
> partition to "fd" (Linux raid auto).
>
> Bruen, Mark wrote:
> > That will be the config once Veritas and/or EMC support HBA path
> > failover on RedHat AS 3.0. Veritas will support it with DMP in version 4
> > due in Q2/04, EMC has not committed to a date yet with PowerPath. In the
> > interim I'm trying to provide path failover using software RAID1 of two
> > hardware RAID5 LUNs one on each path (two switches connected to two
> > storage processors connected to two HBAs per server).
> > -Mark
> >
> > Hamilton Andrew wrote:
> >
> >> What's your SAN? Why don't you configure your raid1 on the SAN and
> >> let it publish that raid group as 1 LUN? Are you using a any kind of
> >> fibre switch between your cards and your SAN?
> >>
> >> Drew
> >>
> >> -----Original Message-----
> >> From: Bruen, Mark [mailto:mbruen@xxxxxxxxxxxxxx]
> >> Sent: Wednesday, January 28, 2004 3:28 PM
> >> To: redhat-list@xxxxxxxxxx
> >> Subject: lpfc RAID1 device panics when one device goes away
> >>
> >>
> >> I'm running RedHat AS 3.0 kernel 2.4.21-4.ELsmp on a Dell 1750 with 2
> >> Emulex
> >> LP9002DC-E HBAs. I've configured a RAID1 device called /dev/md10 from
> >> 2 SAN
> >> based LUNs /dev/sdc and /dev/sde. Everything works fine until I
> >> disable one of
> >> the HBA paths to the disk. Here's the console output:
> >> [root@reacher root]# !lpfc1:1306:LKe:Link Down Event received Data: x2
> >> x2 x0 x20
> >> I/O error: dev 08:40, sector 69792
> >> raid1: Disk failure on sde, disabling device.
> >> Operation continuing on 1 devices
> >> md10: vno@ pspar2e! d?i@
> >> s@kq tAo rec@oqnAst`rIu/Oc
> >> t AaqArra@qyA!@
> >> -v-@ cpont
> >> inI/uOinhgr oihn de_g_r_a_m@vqA@`@ 70288
> >> I/O error: dev 08`I/O sector 70536
> >> I/O error: dev 08:40, sector 70784
> >> I/O error: dev 08:40, sector 71032
> >> I/O error: dev 08:40, sector 71280
> >> I/O error@qA@v@p2!?@
> >> AqA@qA`I/O
> >> BqA@qA@v@p I/Oh 7h____mv@`dev 08:40,
> >> sector 72024
> >> `I/Oerror: dev 08:40, sector 72272
> >> I/O error: dev 08:40, sector 72520
> >> I/O error: dev 08:40, sector 72768
> >> I/O error: dev 08:40, sector 73@qA@v@p2!?@
> >> BqA@qA`I/O
> >> CqA@qA@v@p
> >> I/Ohdeh____mv@`2
> >> I/O error: dev 08:40, `I/Oor 73760
> >> I/O error: dev 08:40, sector 74008
> >> I/O error: dev 08:40, sector 74256
> >> I/O error: dev 08:40, sector 74504
> >> I/O error: dev@qA@v@p2!?@
> >> CqA@qA`I/O
> >> DqA@qA@v@p I/Oh0
> >> h____mv@`8:40, sector 75248
> >> I/O e`I/O: dev 08:40, sector 75496
> >> I/O error: dev 08:40, sector 75744
> >> I/O error: dev 08:40, sector 75992
> >> I/O error: dev 08:40, sector 76240
> >> <@qA@v@p2!?@
> >> DqA@qA`I/O
> >> EqA@qA@v@p I/Oh8:h____mv@` I/O error: dev 08:40,
> >> secto`I/O984
> >> I/O error: dev 08:40, sector 77232
> >> I/O error: dev 08:40, sector 77480
> >> I/O error: dev 08:40, sector 77728
> >> I/O error: dev 08:4@qA@v@p2!?@
> >> EqA@qA`I/O
> >> FqA@qA@v@p I/Oh Ih____mv@`
> >> sector 78352
> >> I/O error:`I/O 08:40, sector 78600
> >> I/O error: dev 08:40, sector 78848
> >> I/O error: dev 08:40, sector 79096
> >> I/O error: dev 08:40, sector 79344
> >> I/@qA@v@p2!?@
> >> FqA@qA`I/O
> >> GqA@qA@v@p I/Oh sh____mv@`error: dev 08:40,
> >> sector
> >> 800`I/O4> I/O error: dev 08:40, sector 80336
> >> I/O error: dev 08:40, sector 80584
> >> I/O error: dev 08:40, sector 80832
> >> I/O error: dev 08:40, se@qA@v@p2!?@
> >> GqA@qA`I/O
> >> HqA@qA@v@p
> >> I/Oherh____mv@`or 81576
> >> I/O error: dev `I/O0, sector 81824
> >> I/O error: dev 08:40, sector 82072
> >> I/O error: dev 08:40, sector 82320
> >> I/O error: dev 08:40, sector 82568
> >> I/O err@qA@v@p2!?@
> >> HqA@qA`I/O
> >> IqA@qA@v@p I/Ohorh____mv@`: dev 08:40,
> >> sector 83312
> >> <4`I/OO error: dev 08:40, sector 83560
> >> I/O error: dev 08:40, sector 83808
> >> I/O error: dev 08:40, sector 84056
> >> Unable to handle kernel paging request at virtual address a0fb8488
> >> printing eip:
> >> c011f694
> >> *pde = 00000000
> >> Oops: 0000
> >> lp parport autofs tg3 floppy microcode keybdev mousedev hid input
> >> usb-ohci
> >> usbcore ext3 jbd raid1 raid0 lpfcdd mptscsih mptbase sd_mod scsi_mod
> >> CPU: -1041286984
> >> EIP: 0060:[<c011f694>] Not tainted
> >> EFLAGS: 00010087
> >>
> >> EIP is at do_page_fault [kernel] 0x54 (2.4.21-4.ELsmp)
> >> eax: f55ac544 ebx: f55ac544 ecx: a0fb8488 edx: e0b3c000
> >> esi: c1ef4000 edi: c011f640 ebp: 000000f0 esp: c1ef40c0
> >> ds: 0068 es: 0068 ss: 0068
> >> Process Dmu (pid: 0, stackpage=c1ef3000)
> >> Stack: 00000000 00000002 022c1008 c1eeee4c c1eff274 00000000 00000000
> >> a0fb8488
> >> c17c4520 f58903f4 00000000 c1efd764 c1eee5fc f7fe53c4 00030001
> >> 00000000
> >> 00000002 022c100c c1efd780 c1eeba44 00000000 00000000 00000003
> >> c1b968ec
> >> Call Trace: [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4178)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef419c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef41b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4278)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef429c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef42b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4378)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef439c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef43b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4478)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef449c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef44b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4578)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef459c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef45b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4678)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef469c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef46b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4778)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef479c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef47b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4878)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef489c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef48b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4978)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef499c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef49b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4a78)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4a9c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef4ab4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4b78)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4b9c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef4bb4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4c78)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4c9c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef4cb4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4d78)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4d9c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef4db4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4e78)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4e9c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef4eb4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4f78)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef4f9c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef4fb4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5078)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef509c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef50b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5178)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef519c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef51b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5278)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef529c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef52b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5378)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef539c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef53b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5478)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef549c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef54b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5578)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef559c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef55b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5678)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef569c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef56b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5778)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef579c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef57b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5878)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef589c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef58b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5978)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef599c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef59b4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5a78)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5a9c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef5ab4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5b78)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5b9c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef5bb4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5c78)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5c9c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef5cb4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5d78)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5d9c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef5db4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5e78)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5e9c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef5eb4)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5f78)
> >> [<c011f640>] do_page_fault [kernel] 0x0 (0xc1ef5f9c)
> >> [<c011f694>] do_page_fault [kernel] 0x54 (0xc1ef5fb4)
> >>
> >> Code: 8b 82 88 c4 47 c0 8b ba 84 c4 47 c0 01 f8 85 c0 0f 85 46 01
> >>
> >> Kernel panic: Fatal exception
> >>
> >> Any Ideas?
> >> Thanks.
> >> -Mark
> >>
> >>
> >> --
> >> redhat-list mailing list
> >> unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
> >> https://www.redhat.com/mailman/listinfo/redhat-list


--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

[Index of Archives]     [CentOS]     [Kernel Development]     [PAM]     [Fedora Users]     [Red Hat Development]     [Big List of Linux Books]     [Linux Admin]     [Gimp]     [Asterisk PBX]     [Yosemite News]     [Red Hat Crash Utility]


  Powered by Linux