On Mon, 2011-11-28 at 10:09 +0800, Jim Barber wrote: > Hi all. > > I'd like to present the details of a kernel panic that I've encountered while using LIO Target with QLogic Fibre Channel adapters. > Hopefully this is the correct place to post. > If not please direct me to the appropriate address. > > I have set up LIO Target on a Super Micro server with a pair of dual port QLE2462 fibre channel host bus adapters. > These are hooked into a pair of Brocade switches and presented to a pair of VMware ESXi 5.0 servers. > I have a RAID-10 array made up of 8x 750GB SATA disks on a 3ware controller, which is presented as a 3000GB (approx 2.7GiB) LUN. > > I have managed to create a datastore and have presented it to the VMware servers. > These see the disk okay and can see all available paths to it. > > I am running a Debian testing distribution on the Super Micro server. > I have used the following command to clone the kernel sources with the target patches applied: > > git clone git://git.kernel.org/pub/scm/linux/kernel/git/nab/lio-core.git > > This pulled a kernel that identifies itself as version 3.1.0-rc10 > I used the .config from the stock Debian 3.1 kernel as a basis for the configuraton options and enabled the TCM modules that I needed. > > On ESXi 5.0 I had to go into the advanced settings and change the value of DataMover.HardwareAcceleratedInit to 0 to silence a > kernel message on the Super Micro server that was regularly being spit out when using the disk on the VMware servers. > That message was long the lines of: "WRITE_SAME w/o UNMAP bit not supported for Block Discard Emulation" > > I have managed to create virtual machines and it mostly works well, but I have now had a couple of kernel OOPs occur, the first of > which I didn't catch. > However the most recent one I had an ssh session going on, and it spat messages to syslog so that I could get them. > The details of the OOPs is as follows: > > kernel:[122001.107498] Call Trace: > kernel:[122001.107605] Code: 89 ee bf 00 80 00 00 44 89 45 3c 45 8b 8c 24 88 03 00 00 89 44 24 10 41 8b 84 24 80 03 00 00 89 44 24 > 08 49 8b 84 24 60 01 00 00 <0f> b6 00 89 04 24 31 c0 e8 ad ae fe ff 66 81 4d 28 00 08 eb 5c > kernel:[122001.107643] CR2: 0000000000000000 > kernel:[122001.108383] Oops: 0000 [#2] SMP > kernel:[122001.108597] Stack: > kernel:[122001.108630] Call Trace: > kernel:[122001.108834] Code: 3f 48 c1 e5 03 48 c1 e0 06 48 8d b0 c0 5d 40 81 48 29 ee e8 d7 2d fe ff 81 4b 14 00 00 00 04 41 59 5b > 5d c3 48 8b 87 a0 02 00 00 > kernel:[122001.108912] CR2: fffffffffffffff8 > Nov 25 21:23:58 san kernel: [122001.107292] TARGET_CORE[qla2xxx]: Detected NON_EXISTENT_LUN Access for 0x00000001 > Nov 25 21:23:58 san kernel: [122001.107305] TARGET_CORE[qla2xxx]: Detected NON_EXISTENT_LUN Access for 0x00000001 > Nov 25 21:23:58 san kernel: [122001.107336] BUG: unable to handle kernel NULL pointer dereference at (null) > Nov 25 21:23:58 san kernel: [122001.107340] IP: [<ffffffffa019ab1b>] qla_tgt_pre_xmit_response+0x1d0/0x2c0 [qla2xxx] Hi again Jim, So I’ve been able to reproduce the OOPs in question, and have tracked down the issue to problematic debug statements that reference a se_cmd descriptor value that may not be setup while sending exception response status. The following patch addresses the issue for me, so please give it a shot when you have a moment. I'll be pushing this to qla_tgt-3.3 branch shortly, so please update your tree with 'git pull origin qla_tgt-3.3' to retest with the following patch, along with the other updates that have been pushed into qla_tgt-3.3 over the weekend. Thanks! --nab commit 35b565b9ce33fdd1d449e5a12b1189573d1d5c95 Author: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx> Date: Mon Nov 28 14:53:09 2011 -0800 qla2xxx: Check se_cmd->t_task_cdb reference in qla_tgt_pre_xmit_response This patch adds an explict check for se_cmd->t_task_cdb within qla_tgt_pre_xmit_response() debug code to address an OOPs where the pointer will not be set during an exception in transport_lookup_cmd_lun() before transport_generic_allocate_tasks() has been called. Reported-by: Jim Barber <jim.barber@xxxxxxxxxxxxx> Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx> diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 364660d..5b61680 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -2043,16 +2043,16 @@ static int qla_tgt_pre_xmit_response(struct qla_tgt_cmd *cmd, struct qla_tgt_prm prm->residual = se_cmd->residual_count; ql_dbg(ql_dbg_tgt, vha, 0xe012, "Residual underflow: %d (tag %d, " "op %x, bufflen %d, rq_result %x)\n", - prm->residual, cmd->tag, - se_cmd->t_task_cdb[0], cmd->bufflen, + prm->residual, cmd->tag, (se_cmd->t_task_cdb != NULL) ? + se_cmd->t_task_cdb[0] : 0x00, cmd->bufflen, prm->rq_result); prm->rq_result |= SS_RESIDUAL_UNDER; } else if (se_cmd->se_cmd_flags & SCF_OVERFLOW_BIT) { prm->residual = se_cmd->residual_count; ql_dbg(ql_dbg_tgt, vha, 0xe013, "Residual overflow: %d (tag %d, " "op %x, bufflen %d, rq_result %x)\n", - prm->residual, cmd->tag, - se_cmd->t_task_cdb[0], cmd->bufflen, + prm->residual, cmd->tag, (se_cmd->t_task_cdb != NULL) ? + se_cmd->t_task_cdb[0] : 0x00, cmd->bufflen, prm->rq_result); prm->rq_result |= SS_RESIDUAL_OVER; prm->residual = -prm->residual; -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html