Hi guys, I reproduced this on two more boxes and have more data. The full set of notes/logs is at http://newdream.net/~sage/bug-4976/notes.txt I stashed a copy of the ceph log and the file itself for each case too: http://newdream.net/~sage/bug-4976/ The new information: - the file was created and allocated prior to the powercycle. - the writes were then replayed by the ceph journaling after restart - garbage data appears at offsets we never wrote to If this pattern isn't suggesting any likely theories, I could rig things up to try to capture the ceph log output leading up to the crash so I positively confirm what the sequence was leading up to the power cycle. Any ideas? Thanks! sage On Tue, 4 Jun 2013, Eric Sandeen wrote: > On 6/4/13 2:24 PM, Sage Weil wrote: > > I'm observing an interesting data corruption pattern: > > > > - write a bunch of files > > - power cycle the box > > I guess this part is important? But I'm wondering why... > > > - remount > > - immediately (within 1-2 seconds) write create a file and > > a new file, right? It was created and written to (w/ the same pattern) before the crash. We then repeat the sequence after when replaying the ceph journal. > > - write to a lower offset, say offset 430423 len 527614 > > - write to a higher offset, say offset 1360810 len 269613 > > (there is other random io going to other files too) > > > > - about 5 seconds later, read the whole file and verify content > > > > And what I see: > > > > - the first region is correct, and intact > > the lower offset you wrote? Right > > - the bytes that follow, up until the block boundary, are 0 > > that's good ;) > > > - the next few blocks are *not* zero! (i've observed 1 and 6 4k blocks) > > that's bad! > > > - then lots of zeros, up until the second region, which appears intact. > > the lot-of-zeros are probably holes? Right > What does xfs_bmap -vvp <filename> say about the file in question? The notes.txt file linked above has the bmap output. Thanks! sage > > I'm pretty reliably hitting this, and have reproduced it twice now and > > found the above consistent pattern (but different filenames, different > > offsets). What I haven't yet confirmed is whether the file was written at > > all prior to the powercycle, since that tends to blow away the last > > bit of the ceph logs, too. I'm adding some additional checks to see > > whether the file is in fact new when the first extent is written. > > > > The other possibly interesting thing is the offsets. The garbage regions > > I saw were > > > > 0xea000 - 0xf0000 > > 234-240 4k blocks > > > 0xff000 - 0x100000 > > 255-256 4k blocks *shrug* > > Is this what you saw w/ the write offsets & sizes you specified above? > > I'm wondering if this could possibly have to do w/ speculative preallocation > on the file somehow exposing these blocks? But that's just handwaving. > > -Eric > > > > > Does this failure pattern look familiar to anyone? I'm pretty sure it is > > new in 3.9, which we switched over to right around the time when this > > started happening. I'm confirming that as well, but just wanted to see if > > this is ringing any bells... > > > > Thanks! > > sage > > > > _______________________________________________ > > xfs mailing list > > xfs@xxxxxxxxxxx > > http://oss.sgi.com/mailman/listinfo/xfs > > > > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs