On Thu, Oct 11, 2012 at 12:52:58PM -0500, Wayne Walker wrote: > In short, I am able to: mkfs...; mount...; cp 1gbfile...; sync; cp > 1gbfile...; sync # and now the xfs is corrupt > > I see multiple bugs > > 1. very simple, non-corner-case actions create a corrupted file system > 2. corrupt data is knowingly written to the file system. > 3. the file system stays online and writable > 4. future write operations to the file system return success. > > Details: ..... Nothing unusual there in the hardware. Seems sane to me. > The exact commands to create the failure: > > /sbin/mkfs.xfs -f -l logdev=/dev/sda5 -b size=4096 -d su=1024k,sw=4 > /dev/sde1 > cat /etc/fstab > mount -t xfs -o defaults,noatime,logdev=/dev/sda5 /dev/sde1 /dtfs_data/data1 > cp random_data.1G /dtfs_data/data1 > # returns 0 > sync > # file system reported no failure yet > cp random_data.1G /dtfs_data/data1 > # returns 0 > sync > # file system reports stack trace, bad agf, and page discard Ok, so having looked at the stack trace, the AGF block taht was read contained zeros, not valid metadata, which is why the allocation failed. Can you remake the filesystem at will? If so, can you run mkfs.xfs as per above, then run the following command? # echo 3 > /proc/sys/vm/drop_caches # for i in `seq 0 4`; do > xfs_db -l /dev/sda5 -c "sb $i" -c p -c "agf $i" -c p /dev/sde1 > done So that we can see what mkfs put on disk? Can you then mount the filesystem, unmount it again, and run the same commands? Then mount the filesystem, run the copy/sync to trigger the error, then unmount and run the commands again? What I'm interested in if whether xfs_db sees the AGF (which ever one it is) as zero, or whether only the kernel is seeing that. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs