Problem: NULL pointer dereference when disconnecting SAS expander-expander link

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

While testing some new SAS hardware, I have encountered an issue that results 
in an "Unable to handle kernel NULL pointer dereference" message from the 
kernel. The stack trace taken from syslog output is attached.

The problem occurs when connecting then disconnecting an external cable 
between two JBOD disk boxes. The problem does not seem to occur when 
connecting and disconnecting a single disk box directly to the HBA.

To reproduce:
1. Boot with the hardware connected as pictured below. All 32 external disks
   are found and no problems are noticed.
2. Disconnect cable B. The 16 disks and enclosure target from disk box 2 are
   removed with no errors noticed. There are some 'failed to synchronize
   cache' messages if the disks are not removed through /sys first but the
   the error will occur either way.
3. Reconnect cable B. No indications that anything has happened from the OS. I 
   have tried waiting for over 2 minutes after connecting the cable.
4. Disconnect cable B again and the attached messages are logged. A hard reset
   is then required to recover.

+---Host w/LSI3801E HBA------------+
|  LSI1068E                        |
+-####-####------------------------+
  ||||     < Cable A
+-####--Disk box 1-----------------+
| ||||                             |
| LSISASx12A                       |
| ||||  ||\`== LSISASx12A < 8 HDDs |
| ||||  ||                         |
| ||||  \`==== LSISASx12A < 8 HDDs |
+-####-----------------------------+
  ||||     < Cable B
+-####--Disk box 2-----------------+
| ||||                             |
| LSISASx12A                       |
| ||||  ||\`== LSISASx12A < 8 HDDs |
| ||||  ||                         |
| ||||  \`==== LSISASx12A < 8 HDDs |
+-####-----------------------------+


For the attached error, the disk boxes were full of SATA disks and the system 
was running the Debian backports.org 2.6.21-1-amd64 (2.6.21-4~bpo.1) kernel. 
The problem also seems to exist with the Debian etch 2.6.18-4-amd64 kernel. 
Happy to try any kernel versions and configs that would be useful.

The diagram represents my current understanding of the expander setup in the 
disk boxes but I could be mistaken. I can provide further details of the view 
of the hardware from the host if it is of interest.

The server also has an on-board LSI1064 connected to 4 internal SAS HDDs:
$ cat /proc/mpt/summary
ioc0: LSISAS1068E, FwRev=01120000h, Ports=1, MaxQ=511, IRQ=19
ioc1: LSISAS1064, FwRev=01102800h, Ports=1, MaxQ=511, IRQ=58

I will continue to investigate and will report any findings but any help in 
resolving the issue would be greatly appreciated.

-- 
Alex Winawer, Unix Systems Programmer
Systems Development & Support, Oxford University Computing Services
Jul  6 09:46:05 just-read-the-instructions kernel: mptbase: ioc0: LogInfo(0x30050000): Originator={IOP}, Code={Task Terminated}, SubCode(0x0000)
Jul  6 09:48:09 just-read-the-instructions kernel: Unable to handle kernel NULL pointer dereference at 00000000000002c0 RIP: 
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff8025a762>] mutex_lock+0x0/0xb
Jul  6 09:48:09 just-read-the-instructions kernel: PGD 10ce07067 PUD 10ea3e067 PMD 0 
Jul  6 09:48:09 just-read-the-instructions kernel: Oops: 0002 [1] SMP 
Jul  6 09:48:09 just-read-the-instructions kernel: CPU 0 
Jul  6 09:48:09 just-read-the-instructions kernel: Modules linked in: raid456 xor ipv6 iptable_mangle iptable_nat nf_nat xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink ipt_owner ipt_REJECT xt_limit ipt_LOG xt_hashlimit ip6_tables ipt_addrtype iptable_filter ip_tables x_tables 8021q serio_raw psmouse i2c_nforce2 shpchp i2c_core pci_hotplug pcspkr k8temp sg sr_mod cdrom joydev evdev ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid1 md_mod ide_generic ata_generic sata_nv libata sd_mod generic usb_storage usbhid hid mptsas mptscsih mptbase scsi_transport_sas amd74xx e1000 scsi_mod forcedeth ide_core ohci_hcd ehci_hcd thermal processor fan
Jul  6 09:48:09 just-read-the-instructions kernel: Pid: 14, comm: events/0 Not tainted 2.6.21-1-amd64 #1
Jul  6 09:48:09 just-read-the-instructions kernel: RIP: 0010:[<ffffffff8025a762>]  [<ffffffff8025a762>] mutex_lock+0x0/0xb
Jul  6 09:48:09 just-read-the-instructions kernel: RSP: 0018:ffff810120201c88  EFLAGS: 00010246
Jul  6 09:48:09 just-read-the-instructions kernel: RAX: 0000000000000000 RBX: ffff81011b1f3000 RCX: 0000000000000000
Jul  6 09:48:09 just-read-the-instructions kernel: RDX: ffff81011a9784c0 RSI: ffff81011b1f3000 RDI: 00000000000002c0
Jul  6 09:48:09 just-read-the-instructions kernel: RBP: 0000000000000004 R08: 000000000000000c R09: ffff81011b3392a0
Jul  6 09:48:09 just-read-the-instructions kernel: R10: 00000000fffffff4 R11: ffff810120201ca8 R12: 0000000000000000
Jul  6 09:48:09 just-read-the-instructions kernel: R13: 00000000000002c0 R14: 0000000000000000 R15: 00000000000005b0
Jul  6 09:48:09 just-read-the-instructions kernel: FS:  00002b86508c56d0(0000) GS:ffffffff804d9000(0000) knlGS:0000000000000000
Jul  6 09:48:09 just-read-the-instructions kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Jul  6 09:48:09 just-read-the-instructions kernel: CR2: 00000000000002c0 CR3: 0000000101c9e000 CR4: 00000000000006e0
Jul  6 09:48:09 just-read-the-instructions kernel: Process events/0 (pid: 14, threadinfo ffff810120200000, task ffff81011c0d2100)
Jul  6 09:48:09 just-read-the-instructions kernel: Stack:  ffffffff880b8cca ffff81011bcb29c0 ffff81011bc8a000 ffff81011ad99d80
Jul  6 09:48:09 just-read-the-instructions kernel:  ffffffff880dc789 ffff81011bc8a5e8 ffff810120201cb8 ffff810120201cb8
Jul  6 09:48:09 just-read-the-instructions kernel:  0000000000000000 0000000000000000 ffff81011bc8a000 ffff81011ad99d80
Jul  6 09:48:09 just-read-the-instructions kernel: Call Trace:
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff880b8cca>] :scsi_transport_sas:sas_port_delete_phy+0x1a/0x5e
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff880dc789>] :mptsas:mptsas_setup_wide_ports+0x72/0x20d
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff880dd097>] :mptsas:mptsas_probe_expander_phys+0x3d0/0x427
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff880c7265>] :mptbase:mpt_timer_expired+0x0/0x24
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff880dd969>] :mptsas:__mptsas_discovery_work+0x16f/0x18a
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff880dd984>] :mptsas:mptsas_discovery_work+0x0/0x39
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff880dd9a8>] :mptsas:mptsas_discovery_work+0x24/0x39
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff80246fa3>] run_workqueue+0x8f/0x137
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff80243bcf>] worker_thread+0x0/0x14a
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff80243ce3>] worker_thread+0x114/0x14a
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff8027a990>] default_wake_function+0x0/0xe
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff8022f236>] kthread+0xd1/0x100
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff80255f38>] child_rip+0xa/0x12
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff8022f165>] kthread+0x0/0x100
Jul  6 09:48:09 just-read-the-instructions kernel:  [<ffffffff80255f2e>] child_rip+0x0/0x12
Jul  6 09:48:09 just-read-the-instructions kernel: 
Jul  6 09:48:09 just-read-the-instructions kernel: 
Jul  6 09:48:09 just-read-the-instructions kernel: Code: f0 ff 0f 79 05 e8 27 01 00 00 c3 f0 ff 07 7f 05 e8 e9 00 00 
Jul  6 09:48:09 just-read-the-instructions kernel: RIP  [<ffffffff8025a762>] mutex_lock+0x0/0xb
Jul  6 09:48:09 just-read-the-instructions kernel:  RSP <ffff810120201c88>
Jul  6 09:48:09 just-read-the-instructions kernel: CR2: 00000000000002c0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux