Re: LIO IBLOCK iSCSI Kernel panic on 3.6.11 when running specific IO profile.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Benjamin,

On Thu, 2013-07-11 at 18:41 +0100, Benjamin ESTRABAUD wrote:
> Hi!
> 
> I've come across a strange kernel panic issue on LIO on Linux 3.6.11 
> (b2824f4e0990716407b0c0e7acee75bb6353febf). The issue seems to be linked 
> with late IOs calling back to an already deleted target and loops here 
> until the kernel panics.
> The test setup was the following:
> 
> - Exporting more than one IBLOCK (here two MD RAIDs, but seems to do the 
> same with any block devices) over iSCSI.
> - Running intensive IOs (64 ios depth, 548k ios, 100% write, 100% 
> random) using IO meter from a single fast host
> 
> While the above IOs are running, when "deleting" the iSCSI targets using 
> rtslib, and by intermittence, the kernel panicked.

Please provide the specific rtslib calls required in order to trigger
this bug.

Also, a quick dump of your top-level targetcli object tree would also be
helpful for reference.

> The issue is quite intermittent and might happen about 1 out of 20 tests 
> or less.
> We have not yet tried to reproduce the issue on Ramdisk or file 
> backstores but it seems that this could be linked to the async and slow 
> nature of the iblock backstores. In fact, the issue usually happens 
> always after 100ms of tearing off the target, which could indicate that 
> a BIO comes back and causes the issue.
> 

FYI, the backend type should not make any difference here.

> Attached is the kernel trace we get when running this test.
> 
> We are going to try to rollback our kernel/update but we were hoping to 
> stick on the 3.6.y branch for now.
> 
> Have you seen this issue before? Do you know if it has been resolved in 
> a later version?

Mmmmm, this smells very much like this v3.6.x specific bug:

target: Fix missing CMD_T_ACTIVE bit regression for pending WRITEs
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/target/target_core_transport.c?id=e627c615553a356f6f70215ebb3933c6e057553e

Because the bugfix was merged after v3.6.y stable support was
discontinued, v3.6.11 does *not* contain this fix.

Also, two other patches in the same bugfix series are not present in
v3.6.11 code:

target: Fix use-after-free in LUN RESET handling
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/target/target_core_transport.c?id=72b59d6ee8adaa51f70377db0a1917ed489bead8

and

target: Release se_cmd when LUN lookup fails for TMR
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/target/target_core_transport.c?id=5a3b6fc0092c5f8dee7820064ee54d2631d48573

So that said, I'd recommend applying these three patches to your local
v3.6.11 tree, or consider moving to >= v3.7.10.

Thanks,

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux