How big is the mon's DB? As in just the total size of the directory you copied FWIW I recently had to perform mon surgery on a 14.2.4 (or was it 14.2.2?) cluster with 8 GB mon size and I encountered no such problems while syncing a new mon which took 10 minutes or so. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Mon, Oct 14, 2019 at 9:41 PM Nikola Ciprich <nikola.ciprich@xxxxxxxxxxx> wrote: > > On Mon, Oct 14, 2019 at 04:31:22PM +0200, Nikola Ciprich wrote: > > On Mon, Oct 14, 2019 at 01:40:19PM +0200, Harald Staub wrote: > > > Probably same problem here. When I try to add another MON, "ceph > > > health" becomes mostly unresponsive. One of the existing ceph-mon > > > processes uses 100% CPU for several minutes. Tried it on 2 test > > > clusters (14.2.4, 3 MONs, 5 storage nodes with around 2 hdd osds > > > each). To avoid errors like "lease timeout", I temporarily increase > > > "mon lease", from 5 to 50 seconds. > > > > > > Not sure how bad it is from a customer PoV. But it is a problem by > > > itself to be several minutes without "ceph health", when there is an > > > increased risk of losing the quorum ... > > > > Hi Harry, > > > > thanks a lot for your reply! not sure we're experiencing the same issue, > > i don't have it on any other cluster.. when this is happening to you, does > > only ceph health stop working, or it also blocks all clients IO? > > > > BR > > > > nik > > > > > > > > > > Harry > > > > > > On 13.10.19 20:26, Nikola Ciprich wrote: > > > >dear ceph users and developers, > > > > > > > >on one of our production clusters, we got into pretty unpleasant situation. > > > > > > > >After rebooting one of the nodes, when trying to start monitor, whole cluster > > > >seems to hang, including IO, ceph -s etc. When this mon is stopped again, > > > >everything seems to continue. Traying to spawn new monitor leads to the same problem > > > >(even on different node). > > > > > > > >I had to give up after minutes of outage, since it's unacceptable. I think we had this > > > >problem once in the past on this cluster, but after some (but much shorter) time, monitor > > > >joined and it worked fine since then. > > > > > > > >All cluster nodes are centos 7 machines, I have 3 monitors (so 2 are now running), I'm > > > >using ceph 13.2.6 > > > > > > > >Network connection seems to be fine. > > > > > > > >Anyone seen similar problem? I'd be very grateful for tips on how to debug and solve this.. > > > > > > > >for those interested, here's log of one of running monitors with debug_mon set to 10/10: > > > > > > > >https://storage.lbox.cz/public/d258d0 > > > > > > > >if I could provide more info, please let me know > > > > > > > >with best regards > > > > > > > >nikola ciprich > > just to add quick update, I was able to reproduce the issue by transferring monitor > directories to test environmen with same IP adressing, so I can safely play with that > now.. > > increasing lease timeout didn't help me to fix the problem, > but at least I seem to be able to use ceph -s now. > > few things I noticed in the meantime: > > - when I start problematic monitor, monitor slow ops start to appear for > quorum leader and the count is slowly increasing: > > 44 slow ops, oldest one blocked for 130 sec, mon.nodev1c has slow ops > > - removing and recreating monitor didn't help > > - checking mon_status of problematic monitor shows it remains in the "synchronizing" state > > I tried increasing debug_ms and debug_paxos but didn't see anything usefull there.. > > will report further when I got something. I anyone has any idea in the meantime, please > let me know. > > BR > > nik > > > > > -- > ------------------------------------- > Ing. Nikola CIPRICH > LinuxBox.cz, s.r.o. > 28. rijna 168, 709 00 Ostrava > > tel.: +420 591 166 214 > fax: +420 596 621 273 > mobil: +420 777 093 799 > > www.linuxbox.cz > > mobil servis: +420 737 238 656 > email servis: servis@xxxxxxxxxxx > ------------------------------------- > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com