Hello, I have a problem with occassional data corruption when using 3.2.x LIO iSCSI target as a SAN storage in VMware vSphere 5. My tests show that under special circumstances, some writes to the target seem to be partially lost. The problem is probably related to VMFS thin provisioning and causes random BSODs and filesystem corruptions of guests. In short, if I have two VMs on two different ESXi hosts that use the same LUN as a datastore for their VMDK disks, and one of the VMs has a growing thin-provisioned disk, than concurrent guest disk activity causes that some writes in the _opposite_ VM are lost. The following setup seems to reliably reproduce the bug: (1) Create vSphere 5 cluster environment with two ESXi hosts, ESX1 and ESX2. (2) Create two linux virtual machines, VM1 located on ESX1 local disk and VM2 located on ESX2 local disk. It's important that they are on two different hosts! (3) Create a clean new shared VMFS5 SAN datastore based on a LUN provided by LIO iSCSI target. (4) Create a 1GB VMDK disk on this LIO datastore and add it to VM1 as /dev/sdb. (5) Sequentially fill VM1's /dev/sdb with a known pattern, say 4kB "AAAA" blocks. (6) Re-read VM1 sdb to check that it really contains "AAAA" pattern. (7) Create a 1GB _thin provisioned_ disk on LIO datastore and add it to VM2. (8) In VM1, start the fill of /dev/sdb again, with "bbbb" pattern now. (9) At the same time, start a similar fill in VM2 that writes "cccc" pattern to its /dev/sdb. It is important to start the fill in both VMs at the same time so that they write to their disks concurrently. At this point, VM1 is overwriting its fully allocated disk and VM2 is growing its thin disk and filling it. (10) Re-read VM2 sdb disk - it contains "cccc" pattern, there's no problem. (11) Re-read VM1 sdb disk - instead of contiguous "bbbb" pattern, there are rare occurrences of pieces of the original "AAAA" pattern, which means that some of the "bbbb" writes were only partially written or weren't written at all! I'm not sure that this is the only possible scenario but at least it's 100% reproducible for me. Notes: (*) Only kernels>=3.2 seem to be affected. I regularly reproduce the bug with vanilla 3.2.0, stable 3.2.9, and latest vanilla 3.3-rcX from git. On the other hand, vanilla 3.1.0 is always OK. So the bug was probably introduced in 3.2. (*) The bug occurs only during the on-demand grow of thin VMDK disks. When I repeat the test with VMDKs that are already fully allocated, everything is OK. As VMDK thin growing involves SCSI-2 reservations for cluster-wide locking, maybe these reservations in some way interfere with the writes from the other session? (*) It is necessary to run the test from _two_ ESXi hosts. If VM1 and VM2 are on the same host, there is no problem. (*) The problem is not related to missing WRITE_SAME support in LIO. I perform all tests with ESXi DataMover.HardwareAcceleratedInit option turned off. (*) There is no evidence of VMFS5 filesystem metadata corruption, there are no errors in ESXi and LIO logs. (*) A sequence of unwritten data always ends on 4kB-aligned offset, regardless of the pattern size. Which also means that only parts of writes are lost, probably in 4kB units. On the other hand, gap's start offset depends on the pattern size. Even if I write blocks of non-power-of-two size, unwritten sequence always starts after a fully written block. (However, I'm not sure how much is this affected by the VM guest block layer.) (*) The number of unwritten gaps is random, but I always get about 15-20 gaps with the above test. (*) All tests were performed with LIO iblock device backed by an LVM volume. I use two 1GE NICs with round-robin path selection policy in each ESXi host. On target side, there are two network portals too. Jumbo frames are enabled everywhere. Does anybody have an idea what's wrong? Regards Martin -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html