Hallo Nab, I'll reproduce this over the weekend, but I'm not sure that I can, but I'll do my best. A few more information of things, that we did: - Network configuration was as always two bonds with two links each with mac hash, one ip per bond. Portal exposed both IPs. - We had iSCSI Port binding with round robin on. So we used multiple iSCSI sessions per ESX server (4 active per LUN). - I dropped the buffer cache of the target to demonstrate how to free up memory fast. The following lines are from my scroll back buffer of screen: (node-62) [~/work/linux-2.6] free total used free shared buffers cached Mem: 66083628 65708924 374704 0 230056 63309488 -/+ buffers/cache: 2169380 63914248 Swap: 0 0 0 (node-62) [~/work/linux-2.6] / sithglan has logged on pts/0 from infra-vlan10.gmvl.de (node-62) [~/work/linux-2.6] sync; echo 3 > /proc/sys/vm/drop_caches free (node-62) [~/work/linux-2.6] free total used free shared buffers cached Mem: 66083628 547728 65535900 0 980 7616 -/+ buffers/cache: 539132 65544496 Swap: 0 0 0 (node-62) [~/work/linux-2.6] dmesg | tail [ 11.899264] IPv6: ADDRCONF(NETDEV_CHANGE): bond1.101: link becomes ready [ 11.899542] IPv6: ADDRCONF(NETDEV_CHANGE): bond1.102: link becomes ready [ 12.315195] igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 12.398846] bonding: bond1: link status definitely up for interface eth2, 1000 Mbps full duplex. [ 14.587938] Bridge firewalling registered [ 14.665002] Rounding down aligned max_sectors from 4294967295 to 4294967288 [57628.200959] Detected MISCOMPARE for addr: ffff8805702cc000 buf: ffff880c4128a000 [57628.200965] Target/fileio: Send MISCOMPARE check condition and sense [57628.336304] Detected MISCOMPARE for addr: ffff88066d090000 buf: ffff880c4128a000 [57628.336310] Target/fileio: Send MISCOMPARE check condition and sense (node-62) [~/work/linux-2.6] exit (node-62) [~/work/linux-2.6] free total used free shared buffers cached Mem: 66083628 600624 65483004 0 2848 83100 -/+ buffers/cache: 514676 65568952 Swap: 0 0 0 However. Afterwards it was stable. And I did this one day before the I/O stall happened. - Shortly before the All Paths Down thing happened we upgraded 8 ESX servers from 5.1 GA to the newest 5.1 patch available. - We had approx. 36 - 72 GB of static state (virtual machine hard disks) on the particular two LUNs in question. - While the issue happed or shortly after or shortly before, I deployed an 8 GB fully patched w2k3 VM. This was the only incident we had. The other four days it was rock stable, no issues whatsoever and we tried to stress it by doing the rescans/svMotions. I don't know if this information helps, but I as I said I'll do my best in order to reproduce it. And we saw the issue on all 8 ESX servers, everything locked up until I did rebooted the target. Afterwards everything was fine, of course we had a few timed out tasks in vCenter but this were only symptoms. Cheers, Thomas -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html