Re: Issues with RBD when rebooting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2018-05-25 12:11, Josef Zelenka wrote:

Hi, we are running a jewel cluster (54OSDs, six nodes, ubuntu 16.04) that serves as a backend for openstack(newton) VMs. TOday we had to reboot one of the nodes(replicated pool, x2) and some of our VMs oopsed with issues with their FS(mainly database VMs, postgresql) - is there a reason for this to happen? if data is replicated, the VMs shouldn't even notice we rebooted one of the nodes, right? Maybe i just don't understand how this works correctly, but i hope someone around here can either tell me why this is happenning or how to fix it.

Thanks

Josef

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

It could be a timeout setting issue. Typically your higher application level timeouts should be larger than your low level io timeouts to allow for recovery. Check if your postgresql has timeouts that may be set too low.
At the low level, the OSD will be detected as failed via osd_heartbeat_grace + osd_heartbeat_interval, you can lower this to for example 20s via:
osd heartbeat grace = 15
osd heartbeat interval = 5
this will give 20 sec before osd is reported as dead and new remapping occurs. Do not lower it too much else you may be triggering remaps on false alarms.

At higher levels, it may be worth double checking:
rados_osd_op_timeout in case of librbd
osd_request_timeout in case of kernel rbd (if enabled)
They need to be larger than the osd timeouts above

At the higher levels

OS disk timeout is (this is usually high enough)
/sys/block/sdX/device/timeout

Your postgresql timeouts, needs to be higher that 20s in this case.

/Maged

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux