Re: Issues with RBD when rebooting

Maged Mokhtar <mmokhtar@xxxxxxxxxxx> · Fri, 25 May 2018 13:39:53 +0200

On 2018-05-25 12:11, Josef Zelenka wrote:

Hi, we are running a jewel cluster (54OSDs, six nodes, ubuntu 16.04) that serves as a backend for openstack(newton) VMs. TOday we had to reboot one of the nodes(replicated pool, x2) and some of our VMs oopsed with issues with their FS(mainly database VMs, postgresql) - is there a reason for this to happen? if data is replicated, the VMs shouldn't even notice we rebooted one of the nodes, right? Maybe i just don't understand how this works correctly, but i hope someone around here can either tell me why this is happenning or how to fix it.

 Thanks

 Josef

 _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

It could be a timeout setting issue. Typically your higher application level timeouts should be larger than your low level io timeouts to allow for recovery. Check if your postgresql has timeouts that may be set too low.
At the low level, the OSD will be detected as failed via osd_heartbeat_grace + osd_heartbeat_interval, you can lower this to for example 20s via:
osd heartbeat grace = 15
osd heartbeat interval = 5
this will give 20 sec before osd is reported as dead and new remapping occurs. Do not lower it too much else you may be triggering remaps on false alarms.
At higher levels, it may be worth double checking:
rados_osd_op_timeout in case of librbd
osd_request_timeout in case of kernel rbd (if enabled)
They need to be larger than the osd timeouts above
At the higher levels
OS disk timeout is (this is usually high enough)
/sys/block/sdX/device/timeout
Your postgresql timeouts, needs to be higher that 20s in this case.
/Maged

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com