On Mar 22, 2013, Chris Mason <clmason@xxxxxxxxxxxx> wrote: > Are you using compression in btrfs or just in leveldb? btrfs lzo compression. > I'd like to take snapshots out of the picture for a minute. That's understandable, I guess, but I don't know that anyone has ever got the problem without snapshots. I mean, even when the master copy of the database got corrupted, snapshots of the subvol containing it were being taken every now and again, because that's the way ceph works. Even back when I noticed corruption of firefox _CACHE_* files, snapshots taken for archival were involved. So, unless the program happens to trigger the problem with the -DNOSNAPS option about as easily as it did without it, I guess we may not have a choice but to keep snapshots in the picture. > We need some way to synchronize the leveldb with snapshotting I purposefully refrained from doing that, because AFAICT ceph doesn't do that. Once I failed to trigger the problem with Sync calls, and determined ceph only syncs the leveldb logs before taking its snapshots, I went without syncing and finally succeeded in triggering the bug in snapshots, by simulating very similar snapshotting and mmaping conditions to those generated by ceph. I haven't managed to trigger the corruption of the master subvol yet with the test program, but I already knew its corruption didn't occur as often as that of the snapshots, and since it smells like two slightly different symptoms of the same bug, I decided to leave the test program at that. -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html