Re: Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing

Josh Durgin <josh.durgin@xxxxxxxxxxx> · Fri, 22 Mar 2013 12:30:23 -0700

On 03/22/2013 12:09 PM, Oliver Francke wrote:
Hi Josh, all,

I did not want to hijack the thread dealing with a crashing VM, but perhaps there are some common things.

Today I installed a fresh cluster with mkephfs, went fine, imported a "master" debian 6.0 image with "format 2", made a snapshot, protected it, and made some clones.
Clones mounted with qemu-nbd, fiddled a bit with IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started, took 2 secs and the VM was up n running. Cool.

Now an ordinary shutdown was performed, made a snapshot of this image. Started again, did some "apt-get update… install s/t…".
Shutdown -> rbd rollback -> startup again -> login -> install s/t else… filesystem showed "many" ex3-errors, fell into read-only mode, massive corruption.

This sounds like it might be a bug in rollback. Could you try cloning
and snapshotting again, but export the image before booting, and after
rolling back, and compare the md5sums?

Running the rollback with:

--debug-ms 1 --debug-rbd 20 --log-file rbd-rollback.log

might help too. Does your ceph.conf where you ran the rollback have
anything related to rbd_cache in it?

qemu config was with ":rbd_cache=false" if it matters. Above scenario is reproducible, and as I stated out, no crash detected.

Perhaps it is in the same area as in the crash-thread, otherwise I will provide logfiles as needed.

It's unrelated, the other thread is an issue with the cache, which does
not cause corruption but triggers a crash.

Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html