On Tue, 2012-04-03 at 08:56 +0200, Henning Becker wrote: > Am Montag, 2. April 2012, 19:27:25 schrieben Sie: > > On Sat, 2012-03-31 at 21:49 +0200, Henning Becker wrote: > > > Hello, > > > I'm using LIO iSCSI target on top of a pacemaker cluster, to provide a > > > redundant replicated storage. > > > > > > I randomly get Kernel Oops in transport_free_dev_tasks, while moving the > > > target from one node to the other. > > > > > > Kernel log says the following (Kernel 3.3.0-rc6): > > > http://pastebin.com/tvm3tK7Z Another log (Kernel 3.2.0): > > > http://pastebin.com/wMNER3We > > > > > > Distribution is Debian and it seems only to happen, if there is an iscsi > > > connection. > > > > > > Any hints? > > > > Hello Henning, > > > > It would be helpful to know a bit more about the target configuration > > that is triggering this bug, and what the cluster resource callbacks are > > being invoked to individual /sys/kernel/config/target/iscsi/ endpoint > > shutdown to perform the move.. > Hello Nicholas, > I've written an inotify log of /sys/kernel/config for you. It's here: > http://pastebin.com/vNEe6vR5 > Hi again Henning, Thanks for the setup info and the nice inotify log. > It seems, the Oops happens while disabling target (writing "0" to > tpgt_1/enable) > Ok, please verify if this session is attached to an explicit fabric initiator NodeACL (iSCSI InitiatorName) configfs group, or attached to a dynamically generated se_node_acl->acl_group using the TPG attrib generate_node_acl=1 to allow 'demo mode' operation (eg: all initiators can login to the endpoint)..? > Configuration is nothing special. I'm using the pacemaker services iscsiLUN > and iscsiTarget to configure my LUNs and my target. These services use > lio_node to configure the target. (I'm using lio_node from GIT) > > The pacemaker config lines look like this: > primitive iscsiLUNTest ocf:heartbeat:iSCSILogicalUnit \ > params lun="0" path="/dev/ReplicatedStorage/Test" > target_iqn="iqn.2012-04.lan.storage:iscsi.storage" > primitive iscsiTarget1 ocf:heartbeat:iSCSITarget \ > params iqn="iqn.2012-04.lan.storage:iscsi.storage" > implementation="lio" portals="10.122.11.100:3260 10.122.13.100:3260" > > > > I can think of one recent change for iscsi-target wrt to session > > referencing counting that could be causing this type of regression in > > lio-core.git HEAD and mainline v3.4-rc1, but I don't see how it would > > effect v3.2.x stable code.. > > > > Is there anything else special about the work-load and/or configuration > > required to trigger this bug you've noticed during in your testing..? > I would say, there is nothing special. :-) > Currently, there is no work load because the system is still in beta testing. > I never used more than one iSCSI connection concurrently. > > I can reproduce the problem on this hardware as well as on my qemu virtual > installation. > > And it seems, that I'm not the only one, who has this problem. Google has > found this Pastebin http://pastebin.com/26k47QKp of a gentoo machine, showing > a similar Kernel Oops. But I didn't figure out, to whom this bug belongs to. > Thanks for the additional pointer on this bug.. I have a few ideas where to look, and will take a look at reproducing this soon. Please let me know wrt to explict NodeACL vs. demo-mode TPG usage. ;) Thanks, --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html