Re: Openstack on ceph rbd installation failure

johnu <johnugeorge109@xxxxxxxxx> · Tue, 23 Jul 2013 17:47:34 -0700

The issue is, I can create the volume but I can attach to instance only if it is in shutdown state. 

If an instance is already in shutdown state and I attach a volume, and then if i restart the instance, it goes into "error state"

The logs are attached.

Jul 23 17:06:10 master 2013-07-23 17:06:10.513 ERROR nova.compute.manager [#033[01;36mreq-ecff0f93-aa84-4471-aa47-4628c790fa54 #033[00;36madmin admin] #033[01;35m[instance: e1c8a73a-ff63-4c09-b24a-2ab755aa4836] Cannot reboot instance: [Errno 32] Corrupt image download. Checksum was d2f67e6e12e87ce50a42b7f0c595cde2 expected c352f4e7121c6eae958bc1570324f17e#033[00m

Jul 23 17:06:10 master 2013-07-23 17:06:10.934 INFO nova.osapi_compute.wsgi.server [#033[00;36m-] #033[01;35m(3925) accepted ('171.71.119.2', 37555)#012#033[00m
Jul 23 17:06:11 master 2013-07-23 17:06:11.401 ERROR nova.openstack.common.rpc.amqp [#033[01;36mreq-ecff0f93-aa84-4471-aa47-4628c790fa54 #033[00;36madmin admin] #033[01;35mException during message handling#033[00m#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00mTraceback (most recent call last):#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m  File "/opt/stack/nova/nova/openstack/common/rpc/amqp.py", line 426, in _process_data#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m    **args)#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m  File "/opt/stack/nova/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m    result = getattr(proxyobj, method)(ctxt, **kwargs)#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m  File "/opt/stack/nova/nova/exception.py", line 99, in wrapped#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m    temp_level, payload)#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m    self.gen.next()#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m  File "/opt/stack/nova/nova/exception.py", line 76, in wrapped#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m    return f(self, context, *args, **kw)#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m  File "/opt/stack/nova/nova/compute/manager.py", line 228, in decorated_function#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m    pass#0122013-07-23 17:06:11.401 TRACE nova.openstack.common.rpc.amqp #033[01;35m#033[00m  File "/usr/lib/python2.7/contextlib.py",

Logs collected when I rebooted another instance,

15:32.666 ERROR nova.compute.manager [#033[01;36mreq-464776fd-2832-4f76-91fa-3e4eff173064 #033[00;36mNone None] #033[01;35m[instance: 4b58dea1-f281-4818-82da-8b9f5f923f64] error during stop() in sync_power_state.#033[00m#0122013-07-23 17:15:32.666 TRACE nova.compute.manager #033[01;35m[instance: 4b58dea1-f281-4818-82da-8b9f5f923f64] #033[00mTraceback (most recent call last):#0122013-07-23 17:15:32.666 TRACE nova.compute.manager #033[01;35m[instance: 4b58dea1-f281-4818-82da-8b9f5f923f64] #033[00m  File "/opt/stack/nova/nova/compute/manager.py", line 4421, in _sync_instance_power_state#0122013-07-23 17:15:32.666 TRACE nova.compute.manager #033[01;35m[instance: 4b58dea1-f281-4818-82da-8b9f5f923f64] #033[00m    self.conductor_api.compute_stop(context, db_instance)#0122013-07-23 17:15:32.666 TRACE nova.compute.manager #033[01;35m[instance: 4b58dea1-f281-4818-82da-8b9f5f923f64] #033[00m  File "/opt/stack/nova/nova/conductor/api.py", line 333, in compute_stop#0122013-07-23 17:15:32.666 TRACE nova.compute.manager #033[01;35m[instance: 4b58dea1-f281-4818-82da-8b9f5f923f64] #033[00m    return self._manager.compute_stop(context, instance, do_cast)#0122013-07-23 17:15:32.666 TRACE nova.compute.manager #033[01;35m[instance: 4b58dea1-f281-4818-82da-8b9f5f923f64] #033[00m  File "/opt/stack/nova/nova/conductor/rpcapi.py", line 483, in compute_stop#0122013-07-23 17:15:32.666 TRACE nova.compute.manager #033[01;35m[instance: 4b58dea1-f281-4818-82da-8b9f5f923f64] #033[00m    return self.call(context, msg, version='1.43')#0122013-07-23 17:15:32.666 TRACE nova.compute.manager #033[01;35m[instance: 4b58dea1-f281-4818-82da-8b9f5f923f64] #033[00m  File "/opt/stack/nova/nova/openstack/common/rpc/proxy.py", line 126, in call#0122013-07-23 17:15:32.666 TRACE nova.compute.manager #033[01;35m[instance: 4b58dea1-f281-4818-82da-8b9f5f923f64] #033[00m    result = rpc.call(context, real_topic, msg, timeout)#0122013-07-23 17:15:32.666 TRACE nova.compute.manager #033[01;35m[instance: 4b58dea1-f281-4818-82da-8b9f5f923f64] #033[0

Jul 23 17:17:18 slave2 2013-07-23 17:17:18.380 ERROR nova.virt.libvirt.driver [#033[01;36mreq-560b46ed-e96e-4645-a23e-3eba6f51437c #033[00;36madmin admin] #033[01;35mAn error occurred while trying to launch a defined domain with xml: <domain type='qemu'>#012  <name>instance-0000000b</name>#012  <uuid>4b58dea1-f281-4818-82da-8b9f5f923f64</uuid>#012  <memory unit='KiB'>524288</memory>#012  <currentMemory unit='KiB'>524288</currentMemory>#012  <vcpu placement='static'>1</vcpu>#012  <sysinfo type='smbios'>#012    <system>#012      <entry name='manufacturer'>OpenStack Foundation</entry>#012      <entry name='product'>OpenStack Nova</entry>#012      <entry name='version'>2013.2</entry>#012      <entry name='serial'>38047832-f758-4e6d-aedf-2d6cf02d6b1e</entry>#012      <entry name='uuid'>4b58dea1-f281-4818-82da-8b9f5f923f64</entry>#012    </system>#012  </sysinfo>#012  <os>#012    <type arch='x86_64' machine='pc-i440fx-1.4'>hvm</type>#012    <kernel>/opt/stack/data/nova/instances/4b58dea1-f281-4818-82da-8b9f5f923f64/kernel</kernel>#012    <initrd>/opt/stack/data/nova/instances/4b58dea1-f281-4818-82da-8b9f5f923f64/ramdisk</initrd>#012    <cmdline>root=/dev/vda console=tty0 console=ttyS0</cmdline>#012    <boot dev='hd'/>#012    <smbios mode='sysinfo'/>#012  </os>#012  <features>#012    <acpi/>#012    <apic/>#012  </features>#012  <clock offset='utc'/>#012  <on_poweroff>destroy</on_poweroff>#012  <on_reboot>restart</on_reboot>#012  <on_crash>destroy</on_crash>#012  <devices>#012    <emulator>/usr/bin/qemu-system-x86_64</emulator>#012    <disk type='file' device='disk'>#012      <driver name='qemu' type='qcow2' cache='none'/>#012      <source file='/opt/stack/data/nova/instances/4b58dea1-f281-4818-82da-8b9f5f923f64/disk'/>#012      <target dev='vda' bus='virtio'/>#012      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>#012    </disk>#012    <disk type='network' device='disk'>#012      <driver name='qemu' type='raw' cache='none'/>#012      <auth username='volumes'>#012        <secret type='ceph' uuid='62d0b384-5

Jul 23 17:17:18 slave2 2013-07-23 17:17:18.410 ERROR nova.compute.manager [#033[01;36mreq-560b46ed-e96e-4645-a23e-3eba6f51437c #033[00;36madmin admin] #033[01;35m[instance: 4b58dea1-f281-4818-82da-8b9f5f923f64] Cannot reboot instance: internal error rbd username 'volumes' specified but secret not found#033[00m

I had setup virsh secret as given in ceph- openstack  documentation . How can I verify it ?. 

On Tue, Jul 23, 2013 at 4:49 PM, johnu <johnugeorge109@xxxxxxxxx> wrote:

There is a hidden bug which I couldn't reproduce. I was using devstack for openstack and I enabled syslog option for getting nova and cinder logs . After reboot, Everything was fine. I was able to create volumes and I verified in rados. 

Another thing I noticed is, I don't have cinder user as in devstack script. Hence, I didn't change owner permissions for keyring files and they are owned by root. But, it works fine though

On Tue, Jul 23, 2013 at 6:19 AM, Sebastien Han <sebastien.han@xxxxxxxxxxxx> wrote:

Can you send your ceph.conf too?
Is /etc/ceph/ceph.conf present? Is the key of user volume present too?

––––
Sébastien Han
Cloud Engineer

"Always give 100%. Unless you're giving blood."

Phone : +33 (0)1 49 70 99 72 – Mobile : +33 (0)6 52 84 44 70

Email : sebastien.han@xxxxxxxxxxxx – Skype : han.sbastien

Address : 10, rue de la Victoire – 75009 Paris

Web : www.enovance.com – Twitter : @enovance

On Jul 23, 2013, at 5:39 AM, johnu <johnugeorge109@xxxxxxxxx> wrote:

Hi,
     I have a  three node ceph  cluster. ceph -w says health ok . I have openstack in the same cluster and trying to map cinder and glance onto rbd. 

I have followed steps as given in http://ceph.com/docs/next/rbd/rbd-openstack/

New Settings that is added  in cinder.conf for three files

volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_pool=volumes
glance_api_version=2
rbd_user=volumes
rbd_secret_uuid=62d0b384-50ad-2e17-15ed-66bfeda40252 ( different for each node)

LOGS seen when I run ./rejoin.sh

2013-07-22 20:35:01.900 INFO cinder.service [-] Starting 1 workers
2013-07-22 20:35:01.909 INFO cinder.service [-] Started child 2290
2013-07-22 20:35:01.965 AUDIT cinder.service [-] Starting cinder-volume node (version 2013.2)

2013-07-22 20:35:02.129 ERROR cinder.volume.drivers.rbd [req-d3bc2e86-e9db-40e8-bcdb-08c609ce44c3 None None] error connecting to ceph cluster
2013-07-22 20:35:02.129 TRACE cinder.volume.drivers.rbd Traceback (most recent call last):

2013-07-22 20:35:02.129 TRACE cinder.volume.drivers.rbd   File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 243, in check_for_setup_error
2013-07-22 20:35:02.129 TRACE cinder.volume.drivers.rbd     with RADOSClient(self):

2013-07-22 20:35:02.129 TRACE cinder.volume.drivers.rbd   File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 215, in __init__
2013-07-22 20:35:02.129 TRACE cinder.volume.drivers.rbd     self.cluster, self.ioctx = driver._connect_to_rados(pool)

2013-07-22 20:35:02.129 TRACE cinder.volume.drivers.rbd   File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 263, in _connect_to_rados
2013-07-22 20:35:02.129 TRACE cinder.volume.drivers.rbd     client.connect()

2013-07-22 20:35:02.129 TRACE cinder.volume.drivers.rbd   File "/usr/lib/python2.7/dist-packages/rados.py", line 192, in connect
2013-07-22 20:35:02.129 TRACE cinder.volume.drivers.rbd     raise make_ex(ret, "error calling connect")

2013-07-22 20:35:02.129 TRACE cinder.volume.drivers.rbd ObjectNotFound: error calling connect
2013-07-22 20:35:02.129 TRACE cinder.volume.drivers.rbd 
2013-07-22 20:35:02.149 ERROR cinder.service [req-d3bc2e86-e9db-40e8-bcdb-08c609ce44c3 None None] Unhandled exception

2013-07-22 20:35:02.149 TRACE cinder.service Traceback (most recent call last):
2013-07-22 20:35:02.149 TRACE cinder.service   File "/opt/stack/cinder/cinder/service.py", line 228, in _start_child
2013-07-22 20:35:02.149 TRACE cinder.service     self._child_process(wrap.server)

2013-07-22 20:35:02.149 TRACE cinder.service   File "/opt/stack/cinder/cinder/service.py", line 205, in _child_process
2013-07-22 20:35:02.149 TRACE cinder.service     launcher.run_server(server)
2013-07-22 20:35:02.149 TRACE cinder.service   File "/opt/stack/cinder/cinder/service.py", line 96, in run_server

2013-07-22 20:35:02.149 TRACE cinder.service     server.start()
2013-07-22 20:35:02.149 TRACE cinder.service   File "/opt/stack/cinder/cinder/service.py", line 359, in start
2013-07-22 20:35:02.149 TRACE cinder.service     self.manager.init_host()

2013-07-22 20:35:02.149 TRACE cinder.service   File "/opt/stack/cinder/cinder/volume/manager.py", line 139, in init_host
2013-07-22 20:35:02.149 TRACE cinder.service     self.driver.check_for_setup_error()

2013-07-22 20:35:02.149 TRACE cinder.service   File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 248, in check_for_setup_error
2013-07-22 20:35:02.149 TRACE cinder.service     raise exception.VolumeBackendAPIException(data="">

2013-07-22 20:35:02.149 TRACE cinder.service VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: error connecting to ceph cluster
2013-07-22 20:35:02.149 TRACE cinder.service 

2013-07-22 20:35:02.191 INFO cinder.service [-] Child 2290 exited with status 2
2013-07-22 20:35:02.192 INFO cinder.service [-] _wait_child 1
2013-07-22 20:35:02.193 INFO cinder.service [-] wait wrap.failed True

Can someone help me with some debug points and solve it ?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com