Re: Kernel Oops while closing iSCSI connection [transport_free_dev_tasks]

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Wed, 04 Apr 2012 14:08:36 -0700

On Tue, 2012-04-03 at 08:56 +0200, Henning Becker wrote:
> Am Montag, 2. April 2012, 19:27:25 schrieben Sie:
> > On Sat, 2012-03-31 at 21:49 +0200, Henning Becker wrote:
> > > Hello,
> > > I'm using LIO iSCSI target on top of a pacemaker cluster, to provide a
> > > redundant replicated storage.
> > > 
> > > I randomly get Kernel Oops in transport_free_dev_tasks, while moving the
> > > target from one node to the other.
> > > 
> > > Kernel log says the following (Kernel 3.3.0-rc6):
> > > http://pastebin.com/tvm3tK7Z Another log (Kernel 3.2.0):
> > > http://pastebin.com/wMNER3We
> > > 
> > > Distribution is Debian and it seems only to happen, if there is an iscsi
> > > connection.
> > > 
> > > Any hints?
> > 
> > Hello Henning,
> > 
> > It would be helpful to know a bit more about the target configuration
> > that is triggering this bug, and what the cluster resource callbacks are
> > being invoked to individual /sys/kernel/config/target/iscsi/ endpoint
> > shutdown to perform the move..
> Hello Nicholas,
> I've written an inotify log of /sys/kernel/config for you. It's here:
> http://pastebin.com/vNEe6vR5
> 

Hi again Henning,

Thanks for the setup info and the nice inotify log.

> It seems, the Oops happens while disabling target (writing "0" to 
> tpgt_1/enable)
> 

Ok, please verify if this session is attached to an explicit fabric
initiator NodeACL (iSCSI InitiatorName) configfs group, or attached to a
dynamically generated se_node_acl->acl_group using the TPG attrib
generate_node_acl=1 to allow 'demo mode' operation (eg: all initiators
can login to the endpoint)..?

> Configuration is nothing special. I'm using the pacemaker services iscsiLUN 
> and iscsiTarget to configure my LUNs and my target. These services use 
> lio_node to configure the target. (I'm using lio_node from GIT) 
> 
> The pacemaker config lines look like this:
> primitive iscsiLUNTest ocf:heartbeat:iSCSILogicalUnit \
>         params lun="0" path="/dev/ReplicatedStorage/Test" 
> target_iqn="iqn.2012-04.lan.storage:iscsi.storage"
> primitive iscsiTarget1 ocf:heartbeat:iSCSITarget \
>         params iqn="iqn.2012-04.lan.storage:iscsi.storage" 
> implementation="lio" portals="10.122.11.100:3260 10.122.13.100:3260"
> > 
> > I can think of one recent change for iscsi-target wrt to session
> > referencing counting that could be causing this type of regression in
> > lio-core.git HEAD and mainline v3.4-rc1, but I don't see how it would
> > effect v3.2.x stable code..
> > 
> > Is there anything else special about the work-load and/or configuration
> > required to trigger this bug you've noticed during in your testing..?
> I would say, there is nothing special. :-)
> Currently, there is no work load because the system is still in beta testing. 
> I never used more than one iSCSI connection concurrently.
> 
> I can reproduce the problem on this hardware as well as on my qemu virtual 
> installation.
> 
> And it seems, that I'm not the only one, who has this problem. Google has 
> found this Pastebin http://pastebin.com/26k47QKp of a gentoo machine, showing 
> a similar Kernel Oops. But I didn't figure out, to whom this bug belongs to.
> 

Thanks for the additional pointer on this bug..  I have a few ideas
where to look, and will take a look at reproducing this soon.  Please
let me know wrt to explict NodeACL vs. demo-mode TPG usage.  ;)

Thanks,

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html