Hi, I just started testing VMs inside ceph this week, ceph-hammer 0.94-5 here. I built several pools, using pool tiering: -
A small replicated SSD pool (5 SSDs only, but I thought it’d be better for IOPS, I intend to test the difference with disks only) -
Overlaying a larger EC pool I just have 2 VMs in Ceph… and one of them is breaking something. The VM that is not breaking was migrated using qemu-img for creating the ceph volume, then migrating the data. Its rbd format is 1 : rbd image 'xxx-disk1': size 20480 MB in 5120 objects order 22 (4096 kB objects) block_name_prefix: rb.0.83a49.3d1b58ba
format: 1 The VM that’s failing has a rbd format 2 this is what I had before things started breaking : rbd image 'yyy-disk1': size 10240 MB in 2560 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.8ae1f47398c89 format: 2 features: layering, striping flags: stripe unit: 4096 kB stripe count: 1 The VM started behaving weirdly with a huge IOwait % during its install (that’s to say it did not take long to go wrong ;) ) Now, this is the only thing that I can get [root@ceph0 ~]# rbd -p irfu-virt info yyy-disk1 2016-02-24 18:30:33.213590 7f00e6f6d7c0 -1 librbd::ImageCtx: error reading image id: (95) Operation not supported rbd: error opening image yyy-disk1: (95) Operation not supported One thing to note : the VM *IS STILL* working : I can still do disk operations, apparently. During the VM installation, I realized I wrongly set the target SSD caching size to 100Mbytes, instead of 100Gbytes, and ceph complained it was almost full : health HEALTH_WARN 'ssd-hot-irfu-virt' at/near target max My question is…… am I facing the bug as reported in this list thread with title “Possible Cache Tier Bug - Can someone confirm” ? Or did I do something wrong ? The libvirt and kvm that are writing into ceph are the following : libvirt-1.2.17-13.el7_2.3.x86_64 qemu-kvm-1.5.3-105.el7_2.3.x86_64 Any idea how I could recover the VM file, if possible ? Please note I have no problem with deleting the VM and rebuilding it, I just spawned it to test. As a matter of fact, I just “virsh destroyed” the VM, to see if I could start it again… and I cant : # virsh start yyy error: Failed to start domain yyy error: internal error: process exited while connecting to monitor: 2016-02-24T17:49:59.262170Z qemu-kvm: -drive file=rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=***==:auth_supported=cephx\;none:mon_host=_____\:6789,if=none,id=drive-virtio-disk0,format=raw:
error reading header from yyy-disk1 2016-02-24T17:49:59.263743Z qemu-kvm: -drive file=rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=A***==:auth_supported=cephx\;none:mon_host=___\:6789,if=none,id=drive-virtio-disk0,format=raw:
could not open disk image rbd:irfu-virt/___-disk1:id=irfu-***==:auth_supported=cephx\;none:mon_host=___\:6789: Could not open 'rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=*** Ideas ? Thanks Frederic |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com