Libvirt hosts freeze after ceph osd+mon problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu 1:2.8+dfsg-6+deb9u3
I'm running 3 nodes with 3 monitors and 8 osds on my nodes, all on IPV6.

When I tested the cluster, I detected strange and severe problem.
On first node I'm running qemu hosts with librados disk connection to the cluster and all 3 monitors mentioned in connection.
On second node I stopped mon and osd with command

kill -STOP MONPID OSDPID

Within one minute all my qemu hosts on first node freeze, so they even don't respond to ping. On VNC screen there is no error (disk or kernel panic), they just hung forever with no console response. Even starting MON and OSD on stopped host doesn't make them running. Destroying the qemu domain and running again is the only solution.

This happens even if virtual machine has all primary OSD on other OSDs from that I have stopped - so it is not writing primary to the stopped OSD.

If I stop only OSD and MON keep running, or I stop only MON and OSD keep running everything looks OK.

When I stop MON and OSD, I can see in log osd.0 1300 heartbeat_check: no reply from ... as usual when OSD fails. During this are virtuals still running, but after that they all stop.

What should I send you to debug this problem? Without fixing that, ceph is not reliable to me.

Thank you
With regards
Jan Pekar
Imatic
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux