Re: debug RBD timeout issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



That's what I am trying to figure out, "what exactly could cause a timeout".
User creates 10 VMs (boot on volume and an attached volume) by Terraform,
then destroy them. Repeat the same, it works fine most times, timeout happens
sometimes at different places, volume creation or volume deletion.
Since Terraform manages resources in parallel, 10 by default, not sure if it matters
how cinder-volume handles those requests. I doubt I can reproduce it with rbd
directly.
I will enable debug logging in cinder-volume to get more info. In the meantime,
I wonder how I can get more info from Ceph to understand such timeout better.


Thanks!
Tony
________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: September 8, 2021 01:05 AM
To: ceph-users@xxxxxxx
Subject:  Re: debug RBD timeout issue

Hi,

from an older cloud version I remember having to increase these settings:

[DEFAULT]
block_device_allocate_retries = 300
block_device_allocate_retries_interval = 10
block_device_creation_timeout = 300


The question is what exactly could cause a timeout. You write that you
only see these timeouts from time to time, then you should try to find
out what the difference is between successful and failing volumes. Is
it the size or anything else? Which glance stores are enabled? Can you
reproduce it, for example 'rbd create...' with the cinder user? Then
you could increase 'debug_rbd' and see if that reveals anything.


Zitat von Tony Liu <tonyliu0592@xxxxxxxxxxx>:

> Hi,
>
> I have OpenStack Ussuri and Ceph Octopus. Sometimes, I see timeout
> when create
> or delete volumes. I can see RBD timeout from cinder-volume. Has
> anyone seen such
> issue? I'd like to see what happens on Ceph. Which service should I
> look into? Is it stuck
> with mon or any OSD? Any option to enable debugging to get more details?
>
> oslo_messaging.rpc.server [req-7802dea8-15f6-4177-b07c-e5241615b777
> d0dddad1fc7a4adf8ef5b185567e1842 b9adeeb6dbd54710a0b033ee49045b54 -
> default default] Exception during message handling: rbd.Timeout:
> [errno 110] error removing image
> oslo_messaging.rpc.server Traceback (most recent call last):
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py",
> line 165, in _process_incoming
> oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py",
> line 276, in dispatch
> oslo_messaging.rpc.server     return self._do_dispatch(endpoint,
> method, ctxt, args)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py",
> line 196, in _do_dispatch
> oslo_messaging.rpc.server     result = func(ctxt, **new_args)
> oslo_messaging.rpc.server   File
> "</usr/lib/python3.6/site-packages/decorator.py:decorator-gen-684>",
> line 2, in delete_volume
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/cinder/coordination.py", line 151,
> in _synchronized
> oslo_messaging.rpc.server     return f(*a, **k)
> oslo_messaging.rpc.server   File
> "</usr/lib/python3.6/site-packages/decorator.py:decorator-gen-683>",
> line 2, in delete_volume
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/cinder/objects/cleanable.py", line
> 212, in wrapper
> oslo_messaging.rpc.server     result = f(*args, **kwargs)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/cinder/volume/manager.py", line
> 917, in delete_volume
> oslo_messaging.rpc.server     new_status)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220,
> in __exit__
> oslo_messaging.rpc.server     self.force_reraise()
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196,
> in force_reraise
> oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
> oslo_messaging.rpc.server     raise value
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/cinder/volume/manager.py", line
> 899, in delete_volume
> oslo_messaging.rpc.server     self.driver.delete_volume(volume)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/cinder/volume/drivers/rbd.py",
> line 1160, in delete_volume
> oslo_messaging.rpc.server     _try_remove_volume(client, volume_name)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/cinder/utils.py", line 696, in
> _wrapper
> oslo_messaging.rpc.server     return r.call(f, *args, **kwargs)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/retrying.py", line 223, in call
> oslo_messaging.rpc.server     return attempt.get(self._wrap_exception)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/retrying.py", line 261, in get
> oslo_messaging.rpc.server     six.reraise(self.value[0],
> self.value[1], self.value[2])
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
> oslo_messaging.rpc.server     raise value
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/retrying.py", line 217, in call
> oslo_messaging.rpc.server     attempt = Attempt(fn(*args, **kwargs),
> attempt_number, False)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/cinder/volume/drivers/rbd.py",
> line 1155, in _try_remove_volume
> oslo_messaging.rpc.server     self.RBDProxy().remove(client.ioctx,
> volume_name)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in
> doit
> oslo_messaging.rpc.server     result = proxy_call(self._autowrap, f,
> *args, **kwargs)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in
> proxy_call
> oslo_messaging.rpc.server     rv = execute(f, *args, **kwargs)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in
> execute
> oslo_messaging.rpc.server     six.reraise(c, e, tb)
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
> oslo_messaging.rpc.server     raise value
> oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in
> tworker
> oslo_messaging.rpc.server     rv = meth(*args, **kwargs)
> oslo_messaging.rpc.server   File "rbd.pyx", line 1283, in rbd.RBD.remove
> oslo_messaging.rpc.server rbd.Timeout: [errno 110] error removing image
>
>
> Thanks!
> Tony
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux