On Fri, 22 Mar 2013, Chris Mason wrote: > Quoting Alexandre Oliva (2013-03-22 10:17:30) > > On Mar 22, 2013, Chris Mason <clmason@xxxxxxxxxxxx> wrote: > > > > > Are you using compression in btrfs or just in leveldb? > > > > btrfs lzo compression. > > Perfect, I'll focus on that part of things. > > > > > > I'd like to take snapshots out of the picture for a minute. > > > > That's understandable, I guess, but I don't know that anyone has ever > > got the problem without snapshots. I mean, even when the master copy of > > the database got corrupted, snapshots of the subvol containing it were > > being taken every now and again, because that's the way ceph works. > > Hopefully Sage can comment, but the basic idea is that if you snapshot a > database file the db must participate. If it doesn't, it really is the > same effect as crashing the box. > > Something is definitely broken if we're corrupting the source files > (either with or without snapshots), but avoiding incomplete writes in > the snapshot files requires synchronization with the db. In this case, we quiesce write activity, call leveldb's sync(), take the snapshot, and then continue. (FWIW, this isn't the first time we've heard about leveldb corruption, but in each case we've looked into the user had the btrfs compression enabled.... so I suspect that's the right avenue of investigation!) sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html