On Thu, 12 Jul 2007, Mike Anderson wrote: > Copying this mail to linux-scsi and Ccing Andrew Vasquez to possibly > provide input on the Qlogic behavior. > > Chandra Seetharaman <sekharan@xxxxxxxxxx> wrote: > > On Thu, 2007-07-12 at 18:35 -0700, Brian De Wolf wrote: > > > Hello All, > > > > > > I'm not sure if this is the right place for this, but it seems to be the only > > > mailing list related to dm, multipath, and rdac, as far as I can tell. I've > > > been trying out the dm-mpath-rdac patch (both yesterday's and previous) with > > > gentoo's unstable 2.6.22 kernel, on a Sun x4100 through a QLA2422 HBA (firmware > > > ql2400_fw.bin.4.00.27) to an IBM DS4000. I am using a version of > > > multipath-tools that I got with git a few days ago. > > > > > > I've got multipath working, it reports the hwhandler correctly ([hwhandler=1 > > > rdac]), and the volume is mountable, etc. It also shows one link as active, the > > > other as ghost. However, once the active link dies, the volume becomes read > > > only, and both connections are listed as failed. Most importantly, something > > > like this shows up in the logs: > > > > > > Jul 12 17:11:15 jimbo kernel: device-mapper: multipath rdac: queueing > > > MODE_SELECT command on 8:32 > > > > It does look like the rdac hardware handler is doing the right thing and > > the qlogic is dying for some reason. > > > > I have tested this code in both RHEL5 and SLES10 environments (qla23xx) > > and they work fine. Can you try in one of those and see if it is any > > different. > > > > Just an FYI w.r.t multipath tools: please remove the patch > > http://git.kernel.org/?p=linux/storage/multipath- > > tools/.git;a=commit;h=e1e1a1bfb2cf76bfd1a49335e3deec5360fb09db from your > > tree for the tools to calculate the path priorities properly. > > > > > > > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h > > > mbx2=8012h mbx3=8002h. > > > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously > > > dumped (ffffc2000171d000) -- ignoring request... > > > Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error > > > recovery - ha= ffff81007e85c530. Hmm yes, there's some real problems going on within the firmware which we need to triage. From the snippet above, the driver was able to capture a firmware-dump of a failure (not sure of the timing and how it relates to the window in which you recognized a 'problem'), but I'll need to to 'capture' the firmware trace and forward it along to us to inspect. 1) download the following shell script: ftp://ftp.qlogic.com/outgoing/linux/beta/8.x/test/qla_dmp.sh 2) copy the script to the host (/tmp) which is experiencing the problems. 3) reboot and load the driver with the ql2xextended_error_logging module parameter set to 1. e.g.: $ insmod qla2xxx.ko ql2xextended_error_logging=1 4) rerun your test and monitor the kernel-messages file for a message similar to: Firmware dump saved to temp buffer (1/adcdabcd) 5) To retrieve the dump, go to a console and type the following: # cd /tmp/ # ./qla_dmp.sh 1 The value passed to qla_dmp.sh should be the same as the first integer in the 'saved to temp buffer' string (in this example, 1). If the operation was successful, a message like to following should be displayed: Firmware dumped to file fw_dump_1_20041217_023222.txt.gz Formward the forward over the file. 6) forward over the /var/log/messages file of the driver load and failure snippet. Not sure which firmware version you are running, but an additional datapoint which may be useful after you send the firmware-dump is to download the latest 24xx firmware file from QLogic.com: ftp://ftp.qlogic.com/outgoing/linux/firmware/ql2400_fw.bin and retry the test. If you still see problems, and see a similar 'Firmware dump saved...' messages. Follow the steps above again and forward the same datapoints. > > > While this may be something for the maintainer of the qla2xxx module (I can't > > > figure out where I'd send it, in that case...) I think it may be of interest > > > that the dm_rdac module tries to push something over the HBA that causes it to > > > bail completely and start from scratch (it starts init processes and loading > > > firmware again). > > > > > > Not to say that I'm not interested in any help getting this working, that is. > > > If you have any suggestions on how to get this working, I'd love to hear them. > > > I'm also willing to guinea pig some testing if you need it (This box still has a > > > bit before it will have to be put in use). I may use redhat to ensure that it's > > > not just a broken HBA, but for the long run we would like it to join our gentoo > > > environment. > > > > > > Thanks! > > > Brian De Wolf > > > > > > PS- If the subject mislead you because you feel that this is just a qla2xxx > > > problem, I'm sorry for wasting your time. Regards, Andrew Vasquez - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html