Re: Libvirt hosts freeze after ceph osd+mon problem

Jan Pekař - Imatic <jan.pekar@xxxxxxxxx> · Tue, 7 Nov 2017 16:44:30 +0100

I am using librbd.

rbd map was only my test to see, if it is not librbd related. Both - 
librbd and rbd map were the same frozen result.

Node running virtuals has 4.9.0-3-amd64 kernel

Two tested virtuals have
4.9.0-3-amd64 kernel, second with
4.10.17-2-pve kernel

JP

On 7.11.2017 10:42, Wido den Hollander wrote:

Op 7 november 2017 om 10:14 schreef Jan Pekař - Imatic <jan.pekar@xxxxxxxxx>:

Additional info - it is not librbd related, I mapped disk through
rbd map and it was the same - virtuals were stuck/frozen.
I happened exactly when in my log appeared

Why aren't you using librbd? Is there a specific reason for that? With Qemu/KVM/libvirt I always suggest to use librbd.

And in addition, what kernel version are you running?

Wido

Nov  7 10:01:27 imatic-hydra01 kernel: [2266883.493688] libceph: osd6 down

I can attach with strace to qemu process and I can get this running in loop:

root@imatic-hydra01:/usr/local/libvirt/bin# strace -p 31963
strace: Process 31963 attached
ppoll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7,
events=POLLIN}, {fd=8, events=POLLIN}, {fd=45, events=POLLIN}, {fd=46,
events=POLLIN}], 6, {tv_sec=0, tv_nsec=355313847}, NULL, 8) = 0 (Timeout)
poll([{fd=10, events=POLLOUT}], 1, 0)   = 1 ([{fd=10,
revents=POLLOUT|POLLHUP}])
ppoll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7,
events=POLLIN}, {fd=8, events=POLLIN}, {fd=45, events=POLLIN}, {fd=46,
events=POLLIN}], 6, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
poll([{fd=10, events=POLLOUT}], 1, 0)   = 1 ([{fd=10,
revents=POLLOUT|POLLHUP}])
ppoll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7,
events=POLLIN}, {fd=8, events=POLLIN}, {fd=45, events=POLLIN}, {fd=46,
events=POLLIN}], 6, {tv_sec=0, tv_nsec=493273904}, NULL, 8) = 0 (Timeout)
Process 31963 detached
   <detached ...>

Can you please give me brief info, what should I debug and how can I do
that? I'm newbie in gdb debugging.
It is not problem inside the virtual machine (like disk not responding)
because I can't even get to VNC console and there is no kernel panic
visible on it. Also I suppose kernel should ping without disk being
available.

Thank you

With regards
Jan Pekar

On 7.11.2017 00:30, Jason Dillaman wrote:
If you could install the debug packages and get a gdb backtrace from all
threads it would be helpful. librbd doesn't utilize any QEMU threads so
even if librbd was deadlocked, the worst case that I would expect would
be your guest OS complaining about hung kernel tasks related to disk IO
(since the disk wouldn't be responding).

On Mon, Nov 6, 2017 at 6:02 PM, Jan Pekař - Imatic <jan.pekar@xxxxxxxxx
<mailto:jan.pekar@xxxxxxxxx>> wrote:

     Hi,

     I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu
     1:2.8+dfsg-6+deb9u3
     I'm running 3 nodes with 3 monitors and 8 osds on my nodes, all on IPV6.

     When I tested the cluster, I detected strange and severe problem.
     On first node I'm running qemu hosts with librados disk connection
     to the cluster and all 3 monitors mentioned in connection.
     On second node I stopped mon and osd with command

     kill -STOP MONPID OSDPID

     Within one minute all my qemu hosts on first node freeze, so they
     even don't respond to ping. On VNC screen there is no error (disk or
     kernel panic), they just hung forever with no console response. Even
     starting MON and OSD on stopped host doesn't make them running.
     Destroying the qemu domain and running again is the only solution.

     This happens even if virtual machine has all primary OSD on other
     OSDs from that I have stopped - so it is not writing primary to the
     stopped OSD.

     If I stop only OSD and MON keep running, or I stop only MON and OSD
     keep running everything looks OK.

     When I stop MON and OSD, I can see in log  osd.0 1300
     heartbeat_check: no reply from ... as usual when OSD fails. During
     this are virtuals still running, but after that they all stop.

     What should I send you to debug this problem? Without fixing that,
     ceph is not reliable to me.

     Thank you
     With regards
     Jan Pekar
     Imatic
     _______________________________________________
     ceph-users mailing list
     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

--
Jason

--
============
Ing. Jan Pekař
jan.pekar@xxxxxxxxx | +420603811737
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz
============
--
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
============
Ing. Jan Pekař
jan.pekar@xxxxxxxxx | +420603811737
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz
============
--
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com