On Tue, May 07, 2013 at 01:37:17AM -0600, Dave Chinner wrote: > Argh, add the cc to Josef... > > On Tue, May 07, 2013 at 05:11:02PM +1000, Dave Chinner wrote: > > Hi Josef, > > > > I was just looking at a generic/311, and I think there's something > > fundamentally wrong with the way it is checking the scratch device. > > > > You reported it was failing for internal test 19 on XFS, but I'm > > seeing is fail after the first test or 2, randomly. It has never > > made it past test 3. So I had a little bit of a closer look at it's > > structure. Essentially it is doing this (and the contents seen by > > each step: > > > > scratch dev + mkfs > > +-------------------------------+ > > overlay dm-flakey > > D-------------------------------D > > mount/write/kill/unmount dm-flakey > > Dx-x-x-x-x-x-x------------------D > > > > All good up to here. Now, you can _check_scratch_fs which sees: > > > > scratch dev + check > > +-------------------------------+ > > > > i.e. it's not seeing all the changes written to dm-flakey and so > > xfs-check it seeing corruption. > > > > After I realised this was stacking block devices and checking the > > underlying block device, the cause was pretty obvious: scratch-dev > > and dm-flakey have different address spaces, so changes written > > throughone address space will not be seen through the other address > > space if there is stale cached data in the original address space. > > > > And that's exactly what is happening. This patch: > > > > --- a/tests/generic/311 > > +++ b/tests/generic/311 > > @@ -79,6 +79,7 @@ _mount_flakey() > > _unmount_flakey() > > { > > $UMOUNT_PROG $SCRATCH_MNT > > + echo 3 > /proc/sys/vm/drop_caches > > } > > > > _load_flakey_table() > > > > Makes the problem go away for xfs_check. But really, I don't like > > the assumption that the test is built on - that writes through one > > block device are visible through another. It's just asking for weird > > problems. > > > > Is there some way that you can restructure this test so it doesn't > > have this problem (e.g. do everything on dm-flakey)? > > So I've made the following patch which I think will do what you want, it's kind of ugly but we have such specific things for fsck that I don't want to have to re-implement it all just for this test. The thing is, I'm still seeing the failure with test 19 for xfs. xfs_check always passes fine for me, it's the part where we re-mount the flakey device and then md5sum the file, it is the md5sum of an empty file and doesn't match the md5sum we take before we unmount. All of that is done on the flakey device so theres no stale caching going on there. Let me know what you think about this patch, I'm open to other less horrible options. Thanks, Josef index 2b3b569..f11119b --- a/tests/generic/311 +++ b/tests/generic/311 @@ -125,7 +125,10 @@ _run_test() #Unmount and fsck to make sure we got a valid fs after replay _unmount_flakey + tmp=$SCRATCH_DEV + SCRATCH_DEV=$FLAKEY_DEV _check_scratch_fs + SCRATCH_DEV=$tmp [ $? -ne 0 ] && _fatal "fsck failed" _mount_flakey _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs