ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools)

SCHAER Frederic <frederic.schaer@xxxxxx> · Wed, 24 Feb 2016 17:56:48 +0000

Hi,

I just started testing VMs inside ceph this week, ceph-hammer 0.94-5 here.

I built several pools, using pool tiering:
-         
A small replicated SSD pool (5 SSDs only, but I thought it’d be better for IOPS, I intend to test the difference with disks only)
-         
Overlaying a larger EC pool

I just have 2 VMs in Ceph… and one of them is breaking something.
The VM that is not breaking was migrated using qemu-img for creating the ceph volume, then migrating the data. Its rbd format is 1 :
rbd image 'xxx-disk1':
        size 20480 MB in 5120 objects
        order 22 (4096 kB objects)
        block_name_prefix: rb.0.83a49.3d1b58ba

format: 1

The VM that’s failing has a rbd format 2
this is what I had before things started breaking :
rbd image 'yyy-disk1':
        size 10240 MB in 2560 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.8ae1f47398c89
        format: 2
        features: layering, striping
        flags:
        stripe unit: 4096 kB
        stripe count: 1

The VM started behaving weirdly with a huge IOwait % during its install (that’s to say it did not take long to go wrong ;) )
Now, this is the only thing that I can get

[root@ceph0 ~]# rbd -p irfu-virt info yyy-disk1
2016-02-24 18:30:33.213590 7f00e6f6d7c0 -1 librbd::ImageCtx: error reading image id: (95) Operation not supported
rbd: error opening image yyy-disk1: (95) Operation not supported

One thing to note : the VM *IS STILL* working : I can still do disk operations, apparently.
During the VM installation, I realized I wrongly set the target SSD caching size to 100Mbytes, instead of 100Gbytes, and ceph complained it was almost full :
     health HEALTH_WARN
            'ssd-hot-irfu-virt' at/near target max

My question is…… am I facing the bug as reported in this list thread with title “Possible Cache Tier Bug - Can someone confirm” ?
Or did I do something wrong ?

The libvirt and kvm that are writing into ceph are the following :
libvirt-1.2.17-13.el7_2.3.x86_64
qemu-kvm-1.5.3-105.el7_2.3.x86_64

Any idea how I could recover the VM file, if possible ?
Please note I have no problem with deleting the VM and rebuilding it, I just spawned it to test.
As a matter of fact, I just “virsh destroyed” the VM, to see if I could start it again… and I cant :

# virsh start yyy
error: Failed to start domain yyy
error: internal error: process exited while connecting to monitor: 2016-02-24T17:49:59.262170Z qemu-kvm: -drive file=rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=***==:auth_supported=cephx\;none:mon_host=_____\:6789,if=none,id=drive-virtio-disk0,format=raw:
 error reading header from yyy-disk1
2016-02-24T17:49:59.263743Z qemu-kvm: -drive file=rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=A***==:auth_supported=cephx\;none:mon_host=___\:6789,if=none,id=drive-virtio-disk0,format=raw:
 could not open disk image rbd:irfu-virt/___-disk1:id=irfu-***==:auth_supported=cephx\;none:mon_host=___\:6789: Could not open 'rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=***

Ideas ?
Thanks
Frederic

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com