On Tue, Aug 9, 2011 at 8:18 AM, Stephen M. Cameron <scameron@xxxxxxxxxxxxxxxxxx> wrote: > From: Stephen M. Cameron <scameron@xxxxxxxxxxxxxxxxxx> > > If a physical device exposed to the OS by hpsa > is replaced (e.g. one hot plug tape drive is replaced > by another, or a tape drive is placed into "OBDR" mode > in which it acts like a CD-ROM device) and a rescan is > initiated, the replaced device will be added to the > SCSI midlayer with target and lun numbers set to -1. > After that, a panic is likely to ensue. When a physical > device is replaced, the lun and target number should be > preserved. > > Signed-off-by: Stephen M. Cameron <scameron@xxxxxxxxxxxxxxxxxx> > --- > drivers/scsi/hpsa.c | 10 ++++++++++ > 1 files changed, 10 insertions(+), 0 deletions(-) > > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c > index 1f32f06..b200b73 100644 > --- a/drivers/scsi/hpsa.c > +++ b/drivers/scsi/hpsa.c > @@ -676,6 +676,16 @@ static void hpsa_scsi_replace_entry(struct ctlr_info *h, int hostno, > BUG_ON(entry < 0 || entry >= HPSA_MAX_SCSI_DEVS_PER_HBA); > removed[*nremoved] = h->dev[entry]; > (*nremoved)++; > + > + /* > + * New physical devices won't have target/lun assigned yet > + * so we need to preserve the values in the slot we are replacing. > + */ > + if (new_entry->target == -1) { > + new_entry->target = h->dev[entry]->target; > + new_entry->lun = h->dev[entry]->lun; > + } > + > h->dev[entry] = new_entry; > added[*nadded] = new_entry; > (*nadded)++; > > Despite the above patch, which I do think is correct, I can still get a panic (on RHEL 6.1 with 2.6.31-rc1 kernel) by using a program to send a particular MODE SELECT to change a tape drive's personality back and forth between OBDR mode (makes the device type switch back and forth between sequential access and CD-ROM) and doing "echo 1 > /sys/.../scsi_host/host1/rescan" to make the hpsa driver rescan for devices and update the SCSI midlayer. The panic appears to be some interaction between the block layer, SG_IO, nautilus (which loves to poke at CD-ROM devices) the cdrom driver, and the hpsa driver's way of updating the SCSI midlayer's notion of what devices are present. Panic looks like this: ------------[ cut here ]------------ kernel BUG at block/cfq-iosched.c:1195! invalid opcode: 0000 [#1] SMP CPU 0 Modules linked in: sr_mod cdrom nfs lockd fscache auth_rpcgss nfs_acl fuse ip6t] Pid: 3388, comm: cdrom_id Not tainted 3.1.0-rc1+ #1 HP ProLiant DL380 G7 RIP: 0010:[<ffffffff812344e2>] [<ffffffff812344e2>] cfq_put_cfqg+0xc2/0xd0 RSP: 0018:ffff8805f5a75af8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff8805f3342848 RCX: 0000000000000077 RDX: 0000000000000000 RSI: ffff8805f81dd498 RDI: ffff8805f3342848 RBP: ffff8805f5a75b08 R08: 00c0000000000000 R09: 0600000000000000 R10: 000000b911dc0248 R11: 0000000000000000 R12: ffff8805f7c657b8 R13: 0000000002224800 R14: ffff8805f81dd498 R15: ffff8805f474b440 FS: 00007f58a528e700(0000) GS:ffff88061f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003f708abd60 CR3: 00000005f721f000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process cdrom_id (pid: 3388, threadinfo ffff8805f5a74000, task ffff8805f4e3f540) Stack: ffff8805f5a75af8 ffff8805f81dd498 ffff8805f5a75b28 ffffffff81238198 ffff8805f47bc038 ffff8805f81dd498 ffff8805f5a75b38 ffffffff8121cc2e ffff8805f5a75b68 ffffffff81223713 0000000000000000 ffff8805f47bc038 Call Trace: [<ffffffff81238198>] cfq_put_request+0x68/0x90 [<ffffffff8121cc2e>] elv_put_request+0x1e/0x20 [<ffffffff81223713>] __blk_put_request+0xb3/0xe0 [<ffffffff81223daa>] blk_put_request+0x3a/0x60 [<ffffffff8122d980>] sg_io+0x1b0/0x400 [<ffffffffa05498a1>] ? sr_do_ioctl+0x191/0x310 [sr_mod] [<ffffffff8122e230>] scsi_cmd_ioctl+0x2a0/0x4c0 [<ffffffffa054964d>] ? sr_drive_status+0x6d/0x100 [sr_mod] [<ffffffff811822fd>] ? mntput+0x1d/0x30 [<ffffffff8116f162>] ? path_put+0x22/0x30 [<ffffffffa053d2a1>] cdrom_ioctl+0x51/0xa60 [cdrom] [<ffffffff81172ef9>] ? path_openat+0x109/0x3e0 [<ffffffffa05488c6>] sr_block_ioctl+0x76/0xf0 [sr_mod] [<ffffffff8122a778>] __blkdev_driver_ioctl+0x28/0x30 [<ffffffff8122ac4e>] blkdev_ioctl+0x1fe/0x6e0 [<ffffffff811989fc>] block_ioctl+0x3c/0x40 [<ffffffff8117640c>] do_vfs_ioctl+0x8c/0x340 [<ffffffff811703a5>] ? putname+0x35/0x50 [<ffffffff81176761>] sys_ioctl+0xa1/0xb0 [<ffffffff814ed842>] system_call_fastpath+0x16/0x1b Code: 00 00 00 48 83 c7 03 83 f9 03 75 9f 48 8b bb 20 03 00 00 e8 81 1d ef ff 4 RIP [<ffffffff812344e2>] cfq_put_cfqg+0xc2/0xd0 RSP <ffff8805f5a75af8> I'm guessing the queue is getting torn down while SG_IO is trying to put requests (from Nautilus) on it, but I'm not quite sure precisely where things begin to go off the rails. -- steve -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html