That's what I am trying to figure out, "what exactly could cause a timeout". User creates 10 VMs (boot on volume and an attached volume) by Terraform, then destroy them. Repeat the same, it works fine most times, timeout happens sometimes at different places, volume creation or volume deletion. Since Terraform manages resources in parallel, 10 by default, not sure if it matters how cinder-volume handles those requests. I doubt I can reproduce it with rbd directly. I will enable debug logging in cinder-volume to get more info. In the meantime, I wonder how I can get more info from Ceph to understand such timeout better. Thanks! Tony ________________________________________ From: Eugen Block <eblock@xxxxxx> Sent: September 8, 2021 01:05 AM To: ceph-users@xxxxxxx Subject: Re: debug RBD timeout issue Hi, from an older cloud version I remember having to increase these settings: [DEFAULT] block_device_allocate_retries = 300 block_device_allocate_retries_interval = 10 block_device_creation_timeout = 300 The question is what exactly could cause a timeout. You write that you only see these timeouts from time to time, then you should try to find out what the difference is between successful and failing volumes. Is it the size or anything else? Which glance stores are enabled? Can you reproduce it, for example 'rbd create...' with the cinder user? Then you could increase 'debug_rbd' and see if that reveals anything. Zitat von Tony Liu <tonyliu0592@xxxxxxxxxxx>: > Hi, > > I have OpenStack Ussuri and Ceph Octopus. Sometimes, I see timeout > when create > or delete volumes. I can see RBD timeout from cinder-volume. Has > anyone seen such > issue? I'd like to see what happens on Ceph. Which service should I > look into? Is it stuck > with mon or any OSD? Any option to enable debugging to get more details? > > oslo_messaging.rpc.server [req-7802dea8-15f6-4177-b07c-e5241615b777 > d0dddad1fc7a4adf8ef5b185567e1842 b9adeeb6dbd54710a0b033ee49045b54 - > default default] Exception during message handling: rbd.Timeout: > [errno 110] error removing image > oslo_messaging.rpc.server Traceback (most recent call last): > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", > line 165, in _process_incoming > oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", > line 276, in dispatch > oslo_messaging.rpc.server return self._do_dispatch(endpoint, > method, ctxt, args) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", > line 196, in _do_dispatch > oslo_messaging.rpc.server result = func(ctxt, **new_args) > oslo_messaging.rpc.server File > "</usr/lib/python3.6/site-packages/decorator.py:decorator-gen-684>", > line 2, in delete_volume > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/cinder/coordination.py", line 151, > in _synchronized > oslo_messaging.rpc.server return f(*a, **k) > oslo_messaging.rpc.server File > "</usr/lib/python3.6/site-packages/decorator.py:decorator-gen-683>", > line 2, in delete_volume > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/cinder/objects/cleanable.py", line > 212, in wrapper > oslo_messaging.rpc.server result = f(*args, **kwargs) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/cinder/volume/manager.py", line > 917, in delete_volume > oslo_messaging.rpc.server new_status) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, > in __exit__ > oslo_messaging.rpc.server self.force_reraise() > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, > in force_reraise > oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise > oslo_messaging.rpc.server raise value > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/cinder/volume/manager.py", line > 899, in delete_volume > oslo_messaging.rpc.server self.driver.delete_volume(volume) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/cinder/volume/drivers/rbd.py", > line 1160, in delete_volume > oslo_messaging.rpc.server _try_remove_volume(client, volume_name) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/cinder/utils.py", line 696, in > _wrapper > oslo_messaging.rpc.server return r.call(f, *args, **kwargs) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/retrying.py", line 223, in call > oslo_messaging.rpc.server return attempt.get(self._wrap_exception) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/retrying.py", line 261, in get > oslo_messaging.rpc.server six.reraise(self.value[0], > self.value[1], self.value[2]) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise > oslo_messaging.rpc.server raise value > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/retrying.py", line 217, in call > oslo_messaging.rpc.server attempt = Attempt(fn(*args, **kwargs), > attempt_number, False) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/cinder/volume/drivers/rbd.py", > line 1155, in _try_remove_volume > oslo_messaging.rpc.server self.RBDProxy().remove(client.ioctx, > volume_name) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in > doit > oslo_messaging.rpc.server result = proxy_call(self._autowrap, f, > *args, **kwargs) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in > proxy_call > oslo_messaging.rpc.server rv = execute(f, *args, **kwargs) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in > execute > oslo_messaging.rpc.server six.reraise(c, e, tb) > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise > oslo_messaging.rpc.server raise value > oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in > tworker > oslo_messaging.rpc.server rv = meth(*args, **kwargs) > oslo_messaging.rpc.server File "rbd.pyx", line 1283, in rbd.RBD.remove > oslo_messaging.rpc.server rbd.Timeout: [errno 110] error removing image > > > Thanks! > Tony > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx