Dears, i have a cluster of ceph, with two monitors. earlier i tried to add a monitor but it stuck syncing and refused to join the quorum. then the two monitors i had got there stores very big, ~25GB. i restarted one of the monitors (a suggestion to get the mon to clear its previous unwonted maps, i can not find it in my browser's history now :D ) and it then kept getting in and out of the quorum. i found that it is slow in response to any thing even to try to get the mon_state. the bottle neck was the disk. the monitor is reading at full speed from the disk, continuously, at a rate ~80 MB/s now i was trying to add reliability to the cluster and ended up with a broken one. the cluster is a 15 osd node and 2 mons, no mds. any idea on why the mon is reading this much of a data? what can i do to debug it? i tried setting the debug level higher. i only get things like these: 2015-05-23 03:10:40.289533 7f74b0ada700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.106:0/3041278 client protocol 0 2015-05-23 03:10:40.308461 7f74b03d3700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.101:0/1006517 client protocol 0 2015-05-23 03:10:40.308704 7f74b00d0700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.101:0/4006517 client protocol 0 2015-05-23 03:10:40.308771 7f74b01d1700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.101:0/2006517 client protocol 0 2015-05-23 03:10:40.308792 7f74affcf700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.101:0/3006517 client protocol 0 2015-05-23 03:10:40.309587 7f74aeebe700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.101:0/1007420 client protocol 0 2015-05-23 03:10:40.348127 7f74aeaba700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.101:0/1020905 client protocol 0 2015-05-23 03:10:40.351846 7f74ad6a6700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.106:0/2011199 client protocol 0 2015-05-23 03:10:40.365362 7f74acc9c700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.103:0/1023531 client protocol 0 and then things like 2015-05-23 03:12:40.579872 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.579879 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.579883 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0x252ecec0 next 190533 (onetime) 2015-05-23 03:12:40.579923 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.579928 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.579932 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0x9426b40 next 190533 (onetime) 2015-05-23 03:12:40.579965 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.579970 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.579974 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0x359d8100 next 190533 (onetime) 2015-05-23 03:12:40.580010 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.580016 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.580019 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0xfa99240 next 190533 (onetime) 2015-05-23 03:12:40.580053 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.580058 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.580061 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0x1ac933c0 next 190533 (onetime) 2015-05-23 03:12:40.580094 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.580099 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.580102 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0xa538000 next 190533 (onetime) 2015-05-23 03:12:40.580135 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.580140 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.580143 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0x1ac93140 next 190533 (onetime) 2015-05-23 03:12:40.581487 7f74b7f32700 5 mon.monitor01@0(electing).elector(655) handle_ack from mon.1 2015-05-23 03:12:40.581492 7f74b7f32700 5 mon.monitor01@0(electing).elector(655) so far i have {0=70368744177663,1=70368744177663} 2015-05-23 03:12:40.581497 7f74b7f32700 10 mon.monitor01@0(electing).elector(655) bump_epoch 655 to 656 2015-05-23 03:12:40.603273 7f74b7f32700 10 mon.monitor01@0(electing) e6 join_election 2015-05-23 03:12:40.603281 7f74b7f32700 10 mon.monitor01@0(electing) e6 _reset 2015-05-23 03:12:40.603283 7f74b7f32700 10 mon.monitor01@0(electing) e6 cancel_probe_timeout (none scheduled) 2015-05-23 03:12:40.603285 7f74b7f32700 10 mon.monitor01@0(electing) e6 timecheck_finish 2015-05-23 03:12:40.603286 7f74b7f32700 10 mon.monitor01@0(electing) e6 scrub_reset 2015-05-23 03:12:40.603315 7f74b7f32700 10 mon.monitor01@0(electing) e6 win_election epoch 656 quorum 0,1 features 70368744177663 2015-05-23 03:12:40.603326 7f74b7f32700 0 log_channel(cluster) log [INF] : mon.monitor01@0 won leader election with quorum 0,1 2015-05-23 03:12:40.615572 7f74a9262700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.103:0/1025658 client protocol 0 2015-05-23 03:12:40.657969 7f74a9060700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.103:0/2023531 client protocol 0 2015-05-23 03:12:40.681508 7f74a8e5e700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.103:0/1011519 client protocol 0 2015-05-23 03:12:40.686338 7f74b7f32700 10 mon.monitor01@0(leader).data_health(656) start_epoch epoch 656 2015-05-23 03:12:40.686352 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck_finish 2015-05-23 03:12:40.686357 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686360 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17275596 192.168.213.103:0/2004855 is open for client.17275596 192.168.213.103:0/2004855 2015-05-23 03:12:40.686371 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686402 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686405 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17272524 192.168.213.103:0/2012444 is open for client.17272524 192.168.213.103:0/2012444 2015-05-23 03:12:40.686413 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686435 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686438 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.18215582 192.168.213.106:0/4041278 is open for client.18215582 192.168.213.106:0/4041278 2015-05-23 03:12:40.686447 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686496 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686500 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17273120 192.168.213.103:0/1019238 is open for client.17273120 192.168.213.103:0/1019238 2015-05-23 03:12:40.686509 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686535 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686536 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17272886 192.168.213.103:0/2015460 is open for client.17272886 192.168.213.103:0/2015460 2015-05-23 03:12:40.686540 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686550 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686552 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17272970 192.168.213.103:0/1017082 is open for client.17272970 192.168.213.103:0/1017082 2015-05-23 03:12:40.686567 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686577 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686578 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.18216138 192.168.213.103:0/3038561 is open for client.18216138 192.168.213.103:0/3038561 2015-05-23 03:12:40.686582 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686592 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686593 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17272520 192.168.213.103:0/1012444 is open for client.17272520 192.168.213.103:0/1012444 2015-05-23 03:12:40.686597 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686608 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686610 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.19621431 192.168.213.103:0/2025791 is open for client.19621431 192.168.213.103:0/2025791 2015-05-23 03:12:40.686613 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686623 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686624 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17273004 192.168.213.103:0/1017826 is open for client.17273004 192.168.213.103:0/1017826 2015-05-23 03:12:40.686628 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686638 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686639 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17436946 192.168.213.101:0/2040116 is open for client.17436946 192.168.213.101:0/2040116 2015-05-23 03:12:40.686643 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686653 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686654 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17275494 192.168.213.103:0/1002572 is open for client.17275494 192.168.213.103:0/1002572 2015-05-23 03:12:40.686658 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686668 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686669 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.18216140 192.168.213.103:0/4038561 is open for client.18216140 192.168.213.103:0/4038561 2015-05-23 03:12:40.686673 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686683 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686684 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17272872 192.168.213.103:0/1015258 is open for client.17272872 192.168.213.103:0/1015258 2015-05-23 03:12:40.686688 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686697 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686699 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17275544 192.168.213.101:0/2033630 is open for client.17275544 192.168.213.101:0/2033630 2015-05-23 03:12:40.686703 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.686723 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.686725 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17275542 192.168.213.101:0/1033630 is open for client.17275542 192.168.213.101:0/1033630 ... ... ... 2015-05-23 03:12:40.687360 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.687370 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.687372 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17274530 192.168.213.103:0/1031546 is open for client.17274530 192.168.213.103:0/1031546 2015-05-23 03:12:40.687376 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.687387 7f74b7f32700 10 mon.monitor01@0(leader) e6 resend_routed_requests 2015-05-23 03:12:40.687389 7f74b7f32700 10 mon.monitor01@0(leader) e6 register_cluster_logger - already registered 2015-05-23 03:12:40.687391 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck_start 2015-05-23 03:12:40.687392 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck_start_round curr 0 2015-05-23 03:12:40.687394 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck_start_round new 1 2015-05-23 03:12:40.687395 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck 2015-05-23 03:12:40.687396 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck start timecheck epoch 656 round 1 2015-05-23 03:12:40.687402 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck send time_check( ping e 656 r 1 ) v1 to mon.1 192.168.217.203:6789/0 2015-05-23 03:12:40.687413 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck_start_round setting up next event 2015-05-23 03:12:40.690522 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.690524 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17273862 192.168.213.103:0/1025658 is open for client.17273862 192.168.213.103:0/1025658 2015-05-23 03:12:40.690528 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.690545 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.690547 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17273598 192.168.213.103:0/2023531 is open for client.17273598 192.168.213.103:0/2023531 2015-05-23 03:12:40.690550 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.690566 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one 2015-05-23 03:12:40.690568 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17272358 192.168.213.103:0/1011519 is open for client.17272358 192.168.213.103:0/1011519 2015-05-23 03:12:40.690571 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session 2015-05-23 03:12:40.690590 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.690593 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.690595 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0x252ecec0 next 190533 (onetime) 2015-05-23 03:12:40.690615 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.690618 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.690620 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0x9426b40 next 190533 (onetime) 2015-05-23 03:12:40.690637 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.690640 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.690642 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0x359d8100 next 190533 (onetime) 2015-05-23 03:12:40.690659 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.690662 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.690663 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0xfa99240 next 190533 (onetime) 2015-05-23 03:12:40.690681 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.690683 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.690685 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0x1ac933c0 next 190533 (onetime) 2015-05-23 03:12:40.690701 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.690704 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.690705 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0xa538000 next 190533 (onetime) 2015-05-23 03:12:40.690724 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.690727 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.690729 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0x1ac93140 next 190533 (onetime) 2015-05-23 03:12:40.690746 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2 2015-05-23 03:12:40.690749 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6 2015-05-23 03:12:40.690750 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0x252ecec0 next 190533 (onetime) any clue on why i am facing this issue? any help is appreciated. thanks |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com