Hello, On Thu 26-09-13 08:22:40, James Dingwall wrote: > >Hi, > > > >We have observed a data corruption bug in a database created by > >the postmap command (BDB file) under the following conditions: > > > >Xen domU guest kernel 3.8, 3.9 (3.5, 3.10, 3.11 don't show the > >behaviour 3.6 and 3.7 are unknown) > >dom0 Xen 4.2.1 / kernel 3.8 or Xen 4.3.0 / kernel 3.11 > >The guest has a passed through block device (phy:/ or file:/) > >The filesytem on the passed through device is ext2/3/4 with a 1k > >block size Thanks for report! So have you really tried with all three filesystems? And don't you have EXT4_USE_FOR_EXT23 set by any chance? There were some changes to ext4 writeback path and extent status tree. So for ext4 I could understand the problem got introduced and fixed. But ext2/3 didn't see any significant changes for a long time... > >By examining a strace of the postmap command we produced a short > >piece of code (at the bottom) which demonstrates the problem. If > >this is executed in a loop such as: > > > >#!/bin/bash > >for i in $(seq 1 5) ; do > > mount /dev/xvde1 /mnt > > pushd /mnt> /dev/null > > echo "checksums after mount" > > md5sum testcase.bin > > [ "${i}" = "1" ] && ./a.out > > echo "checksums before umount" > > md5sum testcase.bin > > popd> /dev/null > > umount /mnt > >done I'll see if I can reproduce this to investigate. > >The output is > > > >checksums after mount > >md5sum: testcase.bin: No such file or directory > >checksums before umount > >719f20c98b69457ce0247d6bf4474cf9 testcase.bin# the correct > >checksum for the file > >checksums after mount > >a90804e64bcc1c0c98dd2cb23d0e4c10 testcase.bin > >checksums before umount > >a90804e64bcc1c0c98dd2cb23d0e4c10 testcase.bin > >checksums after mount > >14bb035eca1ec516ce3865700536fc0c testcase.bin > >checksums before umount > >14bb035eca1ec516ce3865700536fc0c testcase.bin > >checksums after mount > >124d3d3ea8e421925825ff94a815630b testcase.bin > >checksums before umount > >124d3d3ea8e421925825ff94a815630b testcase.bin > >checksums after mount > >7c05f36ffdd6b8217a27c0bd4d9cb531 testcase.bin > >checksums before umount > >7c05f36ffdd6b8217a27c0bd4d9cb531 testcase.bin > > > >If we dd out the block device and then loop mount the resulting > >file we do not see this problem suggesting that communication > >between xen block back/front is ok and that it is only when the > >mount takes place that there is a problem. The default libdb > >behaviour seems to be to create a database with a block size > >matching that of the filesystem, if we override this and set it at > >4k we do not see this issue. This is also observed by changing > >the bs value in our test program. Once bs is > 3072 we no longer > >observe the problem. Also we can avoid the issue in our test > >program by filling in hole while __testcase.bin is being > >generated. A similar test on xfs with a 1k block size did not > >demonstrate this problem. If make a cp of the file before the > >umount then the copied version is and remains correct. > > > >Our searching does not seem to have revealed any similar reports > >or an explicitly identified fix that was introduced for 3.10. Our > >concern therefore is that this is an unrecognised failure that has > >been inadvertently fixed and could equally inadvertently be > >reintroduced by some other change. If this problem sounds > >familiar or there are suggestions on how to narrow this down > >further we would greatly appreciate the advice. Well, you can always use 'git bisect' to find the commit that fixed this. Honza > >#include <string.h> > >#include <stdio.h> > >#include <fcntl.h> > >#include <stdlib.h> > >#include <sys/stat.h> > > > >extern > >int main(int argc, char *argv[]) > >{ > > struct stat *sbuf; > > char *buf, *zero, *null; > > int fd5, fd6, fd7; > > int i; > > int bs = 1024; /* lte 3072 = corruption */ > > > > > > buf = malloc(3*bs); > > zero = malloc(3*bs); > > null = malloc(bs); > > memset(zero, 0, 3*bs); > > sbuf = malloc(sizeof(struct stat)); > > memset(sbuf, 0, sizeof(struct stat)); > > > > for(i = 0; i < 3*bs; i++) { > > buf[i] = i & 0x000f; > > } > > > > fd5 = open("__testcase.bin", O_RDWR|O_CREAT|O_EXCL, 0644); > > //fcntl(fd5, F_GETFD); > > //fcntl(fd5, F_SETFD, FD_CLOEXEC); > > //stat("__testcase.bin", sbuf); > > fstat(fd5, sbuf); > > /* this only writes the first and last blocks */ > > lseek(fd5, 0*bs, SEEK_SET); > > write(fd5, zero, bs); > > //lseek(fd5, 1*bs, SEEK_SET); /* filling in this hole is a fix! */ > > //write(fd5, zero, bs); > > lseek(fd5, 2*bs, SEEK_SET); > > write(fd5, zero, bs); > > fdatasync(fd5); > > rename("__testcase.bin", "testcase.bin"); > > > > //stat("testcase.bin", sbuf); > > fd6 = open("testcase.bin", O_RDWR|O_CREAT, 0); > > //fcntl(fd6, F_GETFD); > > //fcntl(fd6, F_SETFD, FD_CLOEXEC); > > //fstat(fd6, sbuf); > > pread(fd6, null, bs, 0); > > //fstat(fd6, sbuf); > > //fcntl(fd6, F_GETFD); > > //fcntl(fd6, F_SETFD, FD_CLOEXEC); > > //fcntl(fd6, F_GETFD); > > //fcntl(fd6, F_SETFD, FD_CLOEXEC); > > fd7 = open("testcase.bin", O_RDWR); > > flock(fd7, LOCK_EX); > > umask(022); > > pread(fd6, null, bs, 1*bs); > > pread(fd6, null, bs, 2*bs); > > pwrite(fd6, buf, bs, 0*bs); > > pwrite(fd6, buf, bs, 1*bs); > > pwrite(fd6, buf, bs, 2*bs); > > fdatasync(fd6); > > fdatasync(fd6); > > close(fd5); > > close(fd6); > > > > fd5 = open("testcase.bin", O_RDWR, 0); > > //fcntl(fd5, F_GETFD); > > //fcntl(fd5, F_SETFD, FD_CLOEXEC); > > fdatasync(fd5); > > close(fd5); > > > > close(fd7); > > > > free(buf); > > free(sbuf); > > free(zero); > > free(null); > >} > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html