Hi,
I've setup an IBM z10 LPAR (mainframe server) with selfmade kernel.
Attached to the System z10 was an IBM DS8000 storage server. 10x SCSI
LUNs were assigned to LPAR via two pathes:
Example:
36005076303ffc48e000000000000c03e dm-2 IBM,2107900
size=5.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=2 status=enabled
|- 2:0:1:1077821632 sde 8:64 active ready running
`- 3:0:2:1077821632 sdf 8:80 active ready running
Special parameter setting: dev_loss_tmo=90sec; fast_io_fail_tmo=5sec
Kernel version: 2.6.29-37.x.20090604
multipath tools: multipath-tools v0.4.9 (04/04, 2009)
device-mapper: device-mapper-1.02.27-7.fc10.s390x,
device-mapper-libs-1.02.27-7.fc10.s390x
All 10 SCSI LUNs were mounted and filesystem I/O was started (using IBM
internal BLAST tool).
In order to verify correct error recovery of zFCP driver and
multipath-tools I've disabled and re-enabled ports on the BROADE FC
switch between z10 server and storage server.
Port off/on times were random between 10..120sec. After a couple hours
an Oops occured. Analysis by zFCP development was pointing to dm-multipath:
<3>end_request: I/O error, dev sdg, sector 20654968
<6>sd 3:0:1:1077952704: [sdg] Unhandled error code
<6>sd 3:0:1:1077952704: [sdg] Result:
hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
<3>end_request: I/O error, dev sdg, sector 20655680
<6>sd 3:0:1:1077952704: [sdg] Unhandled error code
<6>sd 3:0:1:1077952704: [sdg] Result:
hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
<3>end_request: I/O error, dev sdg, sector 20656392
<6>sd 3:0:1:1077952704: [sdg] Unhandled error code
<6>sd 3:0:1:1077952704: [sdg] Result:
hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
<3>end_request: I/O error, dev sdg, sector 20657104
<1>Unable to handle kernel pointer dereference at virtual kernel
address 0000004418c55000
<4>Oops: 003b [#1] PREEMPT SMP DEBUG_PAGEALLOC
<4>Modules linked in: dm_round_robin sunrpc qeth_l2 dm_multipath
dm_mod chsc_sch qeth ccwgroup
<4>CPU: 1 Not tainted 2.6.29-Swen_debug #1
<4>Process events/1 (pid: 8, task: 0000000079b00c38, ksp:
0000000079b07b80)
<4>Krnl PSW : 0704100180000000 000003e0001c6186
(trigger_event+0x6/0x14 [dm_multipath])
<4> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
<4>Krnl GPRS: 1ec8200000000000 0000004418c55000 0000000074a44968
0000000000000001
<4> 000000000015e96a 0000000079b01420 0000000000000002
fffffffffffffffe
<4> 0000000079b07e38 0000000079cc9e40 000003e0001c6180
0000000079cc9e00
<4> 0000000074a44968 0000000000523e70 000000000015e970
0000000079b07d88
<4>Krnl Code: 000003e0001c6174: ebaff0a00004 lmg
%r10,%r15,160(%r15)
<4> 000003e0001c617a: c0f4ffffff8b brcl
15,3e0001c6090
<4> 000003e0001c6180: e3102ea8ff04 lg
%r1,-344(%r2)
<4> >000003e0001c6186: e32010000004 lg
%r2,0(%r1)
<4> 000003e0001c618c: c0f4fffec1f4 brcl
15,3e00019e574
<4> 000003e0001c6192: 0707 bcr 0,%r7
<4> 000003e0001c6194: eb8ff0580024 stmg
%r8,%r15,88(%r15)
<4> 000003e0001c619a: a7f13f80 tmll
%r15,16256
<4>Call Trace:
<4>([<000000000015e96a>] run_workqueue+0x196/0x258)
<4> [<000000000015eaa6>] worker_thread+0x7a/0xdc
<4> [<0000000000164662>] kthread+0x6e/0xb0
<4> [<000000000010a4b2>] kernel_thread_starter+0x6/0xc
<4> [<000000000010a4ac>] kernel_thread_starter+0x0/0xc
<4>INFO: lockdep is turned off.
<4>Last Breaking-Event-Address:
<4> [<000000000015e96e>] run_workqueue+0x19a/0x258
<4> <0>Kernel panic - not syncing: Fatal exception: panic_on_oops
<4> I/O error, dev sdf, sector 2947736
<6>sd 2:0:3:1077559488: [sdn] Unhandled error code
<6>sd 2:0:3:1077559488: [sdn] Result:
hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
If more information is needed please let me know.
Christian
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel