Re: corruption of active mmapped files in btrfs snapshots

Alexandre Oliva <oliva@xxxxxxx> · Fri, 22 Mar 2013 11:17:30 -0300

On Mar 22, 2013, Chris Mason <clmason@xxxxxxxxxxxx> wrote:

> Are you using compression in btrfs or just in leveldb?

btrfs lzo compression.

> I'd like to take snapshots out of the picture for a minute.

That's understandable, I guess, but I don't know that anyone has ever
got the problem without snapshots.  I mean, even when the master copy of
the database got corrupted, snapshots of the subvol containing it were
being taken every now and again, because that's the way ceph works.
Even back when I noticed corruption of firefox _CACHE_* files, snapshots
taken for archival were involved.  So, unless the program happens to
trigger the problem with the -DNOSNAPS option about as easily as it did
without it, I guess we may not have a choice but to keep snapshots in
the picture.

> We need some way to synchronize the leveldb with snapshotting

I purposefully refrained from doing that, because AFAICT ceph doesn't do
that.  Once I failed to trigger the problem with Sync calls, and
determined ceph only syncs the leveldb logs before taking its snapshots,
I went without syncing and finally succeeded in triggering the bug in
snapshots, by simulating very similar snapshotting and mmaping
conditions to those generated by ceph.  I haven't managed to trigger the
corruption of the master subvol yet with the test program, but I already
knew its corruption didn't occur as often as that of the snapshots, and
since it smells like two slightly different symptoms of the same bug, I
decided to leave the test program at that.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist      Red Hat Brazil Compiler Engineer
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html