Re: Kernel Oops while closing iSCSI connection [transport_free_dev_tasks]

Henning Becker <h.becker@xxxxxxxxxxxxxxxx> · Tue, 03 Apr 2012 08:56:28 +0200

Am Montag, 2. April 2012, 19:27:25 schrieben Sie:
> On Sat, 2012-03-31 at 21:49 +0200, Henning Becker wrote:
> > Hello,
> > I'm using LIO iSCSI target on top of a pacemaker cluster, to provide a
> > redundant replicated storage.
> > 
> > I randomly get Kernel Oops in transport_free_dev_tasks, while moving the
> > target from one node to the other.
> > 
> > Kernel log says the following (Kernel 3.3.0-rc6):
> > http://pastebin.com/tvm3tK7Z Another log (Kernel 3.2.0):
> > http://pastebin.com/wMNER3We
> > 
> > Distribution is Debian and it seems only to happen, if there is an iscsi
> > connection.
> > 
> > Any hints?
> 
> Hello Henning,
> 
> It would be helpful to know a bit more about the target configuration
> that is triggering this bug, and what the cluster resource callbacks are
> being invoked to individual /sys/kernel/config/target/iscsi/ endpoint
> shutdown to perform the move..
Hello Nicholas,
I've written an inotify log of /sys/kernel/config for you. It's here:
http://pastebin.com/vNEe6vR5

It seems, the Oops happens while disabling target (writing "0" to 
tpgt_1/enable)

Configuration is nothing special. I'm using the pacemaker services iscsiLUN 
and iscsiTarget to configure my LUNs and my target. These services use 
lio_node to configure the target. (I'm using lio_node from GIT) 

The pacemaker config lines look like this:
primitive iscsiLUNTest ocf:heartbeat:iSCSILogicalUnit \
        params lun="0" path="/dev/ReplicatedStorage/Test" 
target_iqn="iqn.2012-04.lan.storage:iscsi.storage"
primitive iscsiTarget1 ocf:heartbeat:iSCSITarget \
        params iqn="iqn.2012-04.lan.storage:iscsi.storage" 
implementation="lio" portals="10.122.11.100:3260 10.122.13.100:3260"
> 
> I can think of one recent change for iscsi-target wrt to session
> referencing counting that could be causing this type of regression in
> lio-core.git HEAD and mainline v3.4-rc1, but I don't see how it would
> effect v3.2.x stable code..
> 
> Is there anything else special about the work-load and/or configuration
> required to trigger this bug you've noticed during in your testing..?
I would say, there is nothing special. :-)
Currently, there is no work load because the system is still in beta testing. 
I never used more than one iSCSI connection concurrently.

I can reproduce the problem on this hardware as well as on my qemu virtual 
installation.

And it seems, that I'm not the only one, who has this problem. Google has 
found this Pastebin http://pastebin.com/26k47QKp of a gentoo machine, showing 
a similar Kernel Oops. But I didn't figure out, to whom this bug belongs to.

Regards,
Henning
> 
> Thanks,
> 
> --nab
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html