Re: Monitor stay in synchronizing state for over 24hour

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,
Does anyone have some idea?

Or maybe have some direction about which debug log I can enable to check some information about progress of synchronization.
currently I have set
debug_mon=20
mon_sync_debug=true

But not sure I can really know which log enty I should check


Thanks in advance

BR,
Luke
MYCOM-OSI


From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Luke Kao [Luke.Kao@xxxxxxxxxxxxx]
Sent: Thursday, March 12, 2015 5:22 PM
To: ceph-users@xxxxxxxx
Subject: Monitor stay in synchronizing state for over 24hour

Hello everyone,
I am currently trying to recover a ceph cluster from the disaster, now I have enough osd (171 up and in/195) and have 2 incomplete pgs at the end. 

However the question now is not the incomplete pgs, is about one mon services fail to start due to a strange, wrong monmap is used.  After inject monmap exported from cluster, it's up and enter synchronizing and unable to be back after several hours.  I originally guess it's common for the fact the whole cluster is still busy in recovering and backfilling, however it's over 24hour now and no hint when sync can be done or if it's still in healthy status.

The log tells it is still doing synchronizing and I can see the file under store.db keep being updated.


a small piece of log for the reference:
2015-03-12 03:20:15.025048 7f3cb6c48700 10 mon.NVMBD1CIF290D00@0(synchronizing).data_health(0) service_tick
2015-03-12 03:20:15.025075 7f3cb6c48700  0 mon.NVMBD1CIF290D00@0(synchronizing).data_health(0) update_stats avail 71% total 103080888 used 24281956 avail 73539668
2015-03-12 03:20:30.460672 7f3cb4b43700 10 -- 10.137.36.30:6789/0 >> 10.137.36.31:6789/0 pipe(0x3528280 sd=9 :57111 s=2 pgs=30630 cs=15 l=0 c=0x34b1760).aborted = 0
2015-03-12 03:20:30.460923 7f3cb4b43700 10 -- 10.137.36.30:6789/0 >> 10.137.36.31:6789/0 pipe(0x3528280 sd=9 :57111 s=2 pgs=30630 cs=15 l=0 c=0x34b1760).reader got message 1466470577 0x45b3c80 mon_sync(chunk cookie 37950063980 lc 12343379 bl 791970 bytes last_key logm,full_5120265) v2
2015-03-12 03:20:30.460963 7f3cbc783700 10 -- 10.137.36.30:6789/0 >> 10.137.36.31:6789/0 pipe(0x3528280 sd=9 :57111 s=2 pgs=30630 cs=15 l=0 c=0x34b1760).writer: state = open policy.server=0
2015-03-12 03:20:30.460988 7f3cbc783700 10 -- 10.137.36.30:6789/0 >> 10.137.36.31:6789/0 pipe(0x3528280 sd=9 :57111 s=2 pgs=30630 cs=15 l=0 c=0x34b1760).write_ack 1466470577
2015-03-12 03:20:30.461011 7f3cbc783700 10 -- 10.137.36.30:6789/0 >> 10.137.36.31:6789/0 pipe(0x3528280 sd=9 :57111 s=2 pgs=30630 cs=15 l=0 c=0x34b1760).writer: state = open policy.server=0
2015-03-12 03:20:30.461030 7f3cb6447700  1 -- 10.137.36.30:6789/0 <== mon.1 10.137.36.31:6789/0 1466470577 ==== mon_sync(chunk cookie 37950063980 lc 12343379 bl 791970 bytes last_key logm,full_5120265) v2 ==== 792163+0+0 (2147002791 0 0) 0x45b3c80 con 0x34b1760
2015-03-12 03:20:30.461048 7f3cb6447700 10 mon.NVMBD1CIF290D00@0(synchronizing) e1 handle_sync mon_sync(chunk cookie 37950063980 lc 12343379 bl 791970 bytes last_key logm,full_5120265) v2
2015-03-12 03:20:30.461052 7f3cb6447700 10 mon.NVMBD1CIF290D00@0(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 37950063980 lc 12343379 bl 791970 bytes last_key logm,full_5120265) v2
2015-03-12 03:20:30.463832 7f3cb6447700 10 mon.NVMBD1CIF290D00@0(synchronizing) e1 sync_reset_timeout


I am also wondering some osd are fail to join cluster due to this.  Some osd processes are up without error, but after load pgs, it cannot keep moving to boot and status is still down and out.

Please advise, thanks


Luke Kao

MYCOM OSI




This electronic message contains information from Mycom which may be privileged or confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or any other use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by post or telephone (to the numbers or correspondence address above) or by email (at the email address above) immediately.



This electronic message contains information from Mycom which may be privileged or confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or any other use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by post or telephone (to the numbers or correspondence address above) or by email (at the email address above) immediately.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux