I'm observing an interesting data corruption pattern: - write a bunch of files - power cycle the box - remount - immediately (within 1-2 seconds) write create a file and - write to a lower offset, say offset 430423 len 527614 - write to a higher offset, say offset 1360810 len 269613 (there is other random io going to other files too) - about 5 seconds later, read the whole file and verify content And what I see: - the first region is correct, and intact - the bytes that follow, up until the block boundary, are 0 - the next few blocks are *not* zero! (i've observed 1 and 6 4k blocks) - then lots of zeros, up until the second region, which appears intact. I'm pretty reliably hitting this, and have reproduced it twice now and found the above consistent pattern (but different filenames, different offsets). What I haven't yet confirmed is whether the file was written at all prior to the powercycle, since that tends to blow away the last bit of the ceph logs, too. I'm adding some additional checks to see whether the file is in fact new when the first extent is written. The other possibly interesting thing is the offsets. The garbage regions I saw were 0xea000 - 0xf0000 0xff000 - 0x100000 Does this failure pattern look familiar to anyone? I'm pretty sure it is new in 3.9, which we switched over to right around the time when this started happening. I'm confirming that as well, but just wanted to see if this is ringing any bells... Thanks! sage _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs