Thanks for getting back to me Josh. I've updated to the new 0.55 release and I haven't been able to reproduce the problem. I have the feeling I may be to blame for the problem as when I updated to 0.55 qemu-img segfaulted with a librbd error because there was an old version of the librbd library in another path (which I think was from 0.54). Once I cleaned everything up it worked fine. One thing I didn't notice about the 0.55 release is that 'ceph osd create' no longer accepts arguments and gives '(22) Invalid argument' if you try to specify an OSD number. Running the command without an argument correctly creates an OSD with the next free osd number. I wasn't sure if this was a bug or that the command has changed for 0.55+ and the documentation hasn't been updated yet (Add/Remove OSD's page in the wiki refernces the command with arguments). Thanks again -Matt -----Original Message----- From: Josh Durgin [mailto:josh.durgin@xxxxxxxxxxx] Sent: Monday, 3 December 2012 6:01 PM To: Matthew Anderson Cc: 'ceph-devel@xxxxxxxxxxxxxxx' Subject: Re: VM Corruption on 0.54 when 'client cache = false' That disabling caching improves write speed sounds like something strange is going on. What's the full QEMU/KVM command line and ceph.conf used when running the VM? The corruption issue is more serious, and not something I've seen reported before. Does it occur only with Windows Server 2012 VMs, or does it happen with a Linux VM as well? More specific debugging suggestions are below. fiemap is off by default since we discovered that issue, so this is a different bug. Since the guest can't find its partitions, could you try exporting the image to a file (rbd export pool/image filename), and then run gdisk -l on the file? Doing this before booting, and then again after the corruption occurs and the VM is shut down might help determine the nature of the corruption, and which parts of the image are corrupted. If you run the VM with 'debug ms = 1', 'debug objectcacher = 20', 'debug librbd = 20', and 'log file = /path/to/file/writeable/by/qemu' in the [client] section of ceph.conf, we might be able to see what's happening to the problematic parts of the image. If the logs are long, you can attach them to a bug report referring to this email at http://tracker.newdream.net. Another thing to try is running 'ceph osd deep-scrub', which will check for consistency of objects across OSDs, and report problems in 'ceph -s'. Josh ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f