> Op 7 november 2017 om 10:14 schreef Jan Pekař - Imatic <jan.pekar@xxxxxxxxx>: > > > Additional info - it is not librbd related, I mapped disk through > rbd map and it was the same - virtuals were stuck/frozen. > I happened exactly when in my log appeared > Why aren't you using librbd? Is there a specific reason for that? With Qemu/KVM/libvirt I always suggest to use librbd. And in addition, what kernel version are you running? Wido > Nov 7 10:01:27 imatic-hydra01 kernel: [2266883.493688] libceph: osd6 down > > I can attach with strace to qemu process and I can get this running in loop: > > root@imatic-hydra01:/usr/local/libvirt/bin# strace -p 31963 > strace: Process 31963 attached > ppoll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, > events=POLLIN}, {fd=8, events=POLLIN}, {fd=45, events=POLLIN}, {fd=46, > events=POLLIN}], 6, {tv_sec=0, tv_nsec=355313847}, NULL, 8) = 0 (Timeout) > poll([{fd=10, events=POLLOUT}], 1, 0) = 1 ([{fd=10, > revents=POLLOUT|POLLHUP}]) > ppoll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, > events=POLLIN}, {fd=8, events=POLLIN}, {fd=45, events=POLLIN}, {fd=46, > events=POLLIN}], 6, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout) > poll([{fd=10, events=POLLOUT}], 1, 0) = 1 ([{fd=10, > revents=POLLOUT|POLLHUP}]) > ppoll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, > events=POLLIN}, {fd=8, events=POLLIN}, {fd=45, events=POLLIN}, {fd=46, > events=POLLIN}], 6, {tv_sec=0, tv_nsec=493273904}, NULL, 8) = 0 (Timeout) > Process 31963 detached > <detached ...> > > Can you please give me brief info, what should I debug and how can I do > that? I'm newbie in gdb debugging. > It is not problem inside the virtual machine (like disk not responding) > because I can't even get to VNC console and there is no kernel panic > visible on it. Also I suppose kernel should ping without disk being > available. > > Thank you > > With regards > Jan Pekar > > > > On 7.11.2017 00:30, Jason Dillaman wrote: > > If you could install the debug packages and get a gdb backtrace from all > > threads it would be helpful. librbd doesn't utilize any QEMU threads so > > even if librbd was deadlocked, the worst case that I would expect would > > be your guest OS complaining about hung kernel tasks related to disk IO > > (since the disk wouldn't be responding). > > > > On Mon, Nov 6, 2017 at 6:02 PM, Jan Pekař - Imatic <jan.pekar@xxxxxxxxx > > <mailto:jan.pekar@xxxxxxxxx>> wrote: > > > > Hi, > > > > I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu > > 1:2.8+dfsg-6+deb9u3 > > I'm running 3 nodes with 3 monitors and 8 osds on my nodes, all on IPV6. > > > > When I tested the cluster, I detected strange and severe problem. > > On first node I'm running qemu hosts with librados disk connection > > to the cluster and all 3 monitors mentioned in connection. > > On second node I stopped mon and osd with command > > > > kill -STOP MONPID OSDPID > > > > Within one minute all my qemu hosts on first node freeze, so they > > even don't respond to ping. On VNC screen there is no error (disk or > > kernel panic), they just hung forever with no console response. Even > > starting MON and OSD on stopped host doesn't make them running. > > Destroying the qemu domain and running again is the only solution. > > > > This happens even if virtual machine has all primary OSD on other > > OSDs from that I have stopped - so it is not writing primary to the > > stopped OSD. > > > > If I stop only OSD and MON keep running, or I stop only MON and OSD > > keep running everything looks OK. > > > > When I stop MON and OSD, I can see in log osd.0 1300 > > heartbeat_check: no reply from ... as usual when OSD fails. During > > this are virtuals still running, but after that they all stop. > > > > What should I send you to debug this problem? Without fixing that, > > ceph is not reliable to me. > > > > Thank you > > With regards > > Jan Pekar > > Imatic > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > > > > > > > -- > > Jason > > -- > ============ > Ing. Jan Pekař > jan.pekar@xxxxxxxxx | +420603811737 > ---- > Imatic | Jagellonská 14 | Praha 3 | 130 00 > http://www.imatic.cz > ============ > -- > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com