ceoh monitor is very slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dears,
i have a cluster of ceph, with two monitors.
earlier i tried to add a monitor but it stuck syncing and refused to join the quorum.
then the two monitors i had got there stores very big, ~25GB.
i restarted one of the monitors (a suggestion to get the mon to clear its previous unwonted maps, i can not find it in my browser's history now :D ) and it then kept getting in and out of the quorum.
i found that it is slow in response to any thing even to try to get the mon_state. the bottle neck was the disk.
the monitor is reading at full speed from the disk, continuously, at a rate ~80 MB/s
now i was trying to add reliability to the cluster and ended up with a broken one.
the cluster is a 15 osd node and 2 mons, no mds.
any idea on why the mon is reading this much of a data? what can i do to debug it?
i tried setting the debug level higher. i only get things like these:

2015-05-23 03:10:40.289533 7f74b0ada700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.106:0/3041278 client protocol 0
2015-05-23 03:10:40.308461 7f74b03d3700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.101:0/1006517 client protocol 0
2015-05-23 03:10:40.308704 7f74b00d0700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.101:0/4006517 client protocol 0
2015-05-23 03:10:40.308771 7f74b01d1700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.101:0/2006517 client protocol 0
2015-05-23 03:10:40.308792 7f74affcf700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.101:0/3006517 client protocol 0
2015-05-23 03:10:40.309587 7f74aeebe700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.101:0/1007420 client protocol 0
2015-05-23 03:10:40.348127 7f74aeaba700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.101:0/1020905 client protocol 0
2015-05-23 03:10:40.351846 7f74ad6a6700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.106:0/2011199 client protocol 0
2015-05-23 03:10:40.365362 7f74acc9c700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.103:0/1023531 client protocol 0

and then things like

2015-05-23 03:12:40.579872 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.579879 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.579883 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0x252ecec0 next 190533 (onetime)
2015-05-23 03:12:40.579923 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.579928 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.579932 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0x9426b40 next 190533 (onetime)
2015-05-23 03:12:40.579965 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.579970 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.579974 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0x359d8100 next 190533 (onetime)
2015-05-23 03:12:40.580010 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.580016 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.580019 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0xfa99240 next 190533 (onetime)
2015-05-23 03:12:40.580053 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.580058 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.580061 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0x1ac933c0 next 190533 (onetime)
2015-05-23 03:12:40.580094 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.580099 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.580102 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0xa538000 next 190533 (onetime)
2015-05-23 03:12:40.580135 7f74b7f32700 10 mon.monitor01@0(electing) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.580140 7f74b7f32700 10 mon.monitor01@0(electing) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.580143 7f74b7f32700 10 mon.monitor01@0(electing).osd e190532 check_sub 0x1ac93140 next 190533 (onetime)
2015-05-23 03:12:40.581487 7f74b7f32700  5 mon.monitor01@0(electing).elector(655) handle_ack from mon.1
2015-05-23 03:12:40.581492 7f74b7f32700  5 mon.monitor01@0(electing).elector(655)  so far i have {0=70368744177663,1=70368744177663}
2015-05-23 03:12:40.581497 7f74b7f32700 10 mon.monitor01@0(electing).elector(655) bump_epoch 655 to 656
2015-05-23 03:12:40.603273 7f74b7f32700 10 mon.monitor01@0(electing) e6 join_election
2015-05-23 03:12:40.603281 7f74b7f32700 10 mon.monitor01@0(electing) e6 _reset
2015-05-23 03:12:40.603283 7f74b7f32700 10 mon.monitor01@0(electing) e6 cancel_probe_timeout (none scheduled)
2015-05-23 03:12:40.603285 7f74b7f32700 10 mon.monitor01@0(electing) e6 timecheck_finish
2015-05-23 03:12:40.603286 7f74b7f32700 10 mon.monitor01@0(electing) e6 scrub_reset
2015-05-23 03:12:40.603315 7f74b7f32700 10 mon.monitor01@0(electing) e6 win_election epoch 656 quorum 0,1 features 70368744177663
2015-05-23 03:12:40.603326 7f74b7f32700  0 log_channel(cluster) log [INF] : mon.monitor01@0 won leader election with quorum 0,1
2015-05-23 03:12:40.615572 7f74a9262700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.103:0/1025658 client protocol 0
2015-05-23 03:12:40.657969 7f74a9060700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.103:0/2023531 client protocol 0
2015-05-23 03:12:40.681508 7f74a8e5e700 10 mon.monitor01@0(leader) e6 ms_verify_authorizer 192.168.213.103:0/1011519 client protocol 0
2015-05-23 03:12:40.686338 7f74b7f32700 10 mon.monitor01@0(leader).data_health(656) start_epoch epoch 656
2015-05-23 03:12:40.686352 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck_finish
2015-05-23 03:12:40.686357 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686360 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17275596 192.168.213.103:0/2004855 is open for client.17275596 192.168.213.103:0/2004855
2015-05-23 03:12:40.686371 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686402 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686405 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17272524 192.168.213.103:0/2012444 is open for client.17272524 192.168.213.103:0/2012444
2015-05-23 03:12:40.686413 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686435 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686438 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.18215582 192.168.213.106:0/4041278 is open for client.18215582 192.168.213.106:0/4041278
2015-05-23 03:12:40.686447 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686496 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686500 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17273120 192.168.213.103:0/1019238 is open for client.17273120 192.168.213.103:0/1019238
2015-05-23 03:12:40.686509 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686535 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686536 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17272886 192.168.213.103:0/2015460 is open for client.17272886 192.168.213.103:0/2015460
2015-05-23 03:12:40.686540 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686550 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686552 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17272970 192.168.213.103:0/1017082 is open for client.17272970 192.168.213.103:0/1017082
2015-05-23 03:12:40.686567 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686577 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686578 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.18216138 192.168.213.103:0/3038561 is open for client.18216138 192.168.213.103:0/3038561
2015-05-23 03:12:40.686582 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686592 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686593 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17272520 192.168.213.103:0/1012444 is open for client.17272520 192.168.213.103:0/1012444
2015-05-23 03:12:40.686597 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686608 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686610 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.19621431 192.168.213.103:0/2025791 is open for client.19621431 192.168.213.103:0/2025791
2015-05-23 03:12:40.686613 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686623 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686624 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17273004 192.168.213.103:0/1017826 is open for client.17273004 192.168.213.103:0/1017826
2015-05-23 03:12:40.686628 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686638 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686639 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17436946 192.168.213.101:0/2040116 is open for client.17436946 192.168.213.101:0/2040116
2015-05-23 03:12:40.686643 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686653 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686654 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17275494 192.168.213.103:0/1002572 is open for client.17275494 192.168.213.103:0/1002572
2015-05-23 03:12:40.686658 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686668 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686669 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.18216140 192.168.213.103:0/4038561 is open for client.18216140 192.168.213.103:0/4038561
2015-05-23 03:12:40.686673 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686683 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686684 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17272872 192.168.213.103:0/1015258 is open for client.17272872 192.168.213.103:0/1015258
2015-05-23 03:12:40.686688 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686697 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686699 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17275544 192.168.213.101:0/2033630 is open for client.17275544 192.168.213.101:0/2033630
2015-05-23 03:12:40.686703 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.686723 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.686725 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17275542 192.168.213.101:0/1033630 is open for client.17275542 192.168.213.101:0/1033630

...
...
...

2015-05-23 03:12:40.687360 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.687370 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.687372 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17274530 192.168.213.103:0/1031546 is open for client.17274530 192.168.213.103:0/1031546
2015-05-23 03:12:40.687376 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.687387 7f74b7f32700 10 mon.monitor01@0(leader) e6 resend_routed_requests
2015-05-23 03:12:40.687389 7f74b7f32700 10 mon.monitor01@0(leader) e6 register_cluster_logger - already registered
2015-05-23 03:12:40.687391 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck_start
2015-05-23 03:12:40.687392 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck_start_round curr 0
2015-05-23 03:12:40.687394 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck_start_round new 1
2015-05-23 03:12:40.687395 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck
2015-05-23 03:12:40.687396 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck start timecheck epoch 656 round 1
2015-05-23 03:12:40.687402 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck send time_check( ping e 656 r 1 ) v1 to mon.1 192.168.217.203:6789/0
2015-05-23 03:12:40.687413 7f74b7f32700 10 mon.monitor01@0(leader) e6 timecheck_start_round setting up next event
2015-05-23 03:12:40.690522 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.690524 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17273862 192.168.213.103:0/1025658 is open for client.17273862 192.168.213.103:0/1025658
2015-05-23 03:12:40.690528 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.690545 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.690547 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17273598 192.168.213.103:0/2023531 is open for client.17273598 192.168.213.103:0/2023531
2015-05-23 03:12:40.690550 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.690566 7f74b7f32700 10 mon.monitor01@0(leader) e6 do not have session, making new one
2015-05-23 03:12:40.690568 7f74b7f32700 10 mon.monitor01@0(leader) e6 ms_dispatch new session MonSession: client.17272358 192.168.213.103:0/1011519 is open for client.17272358 192.168.213.103:0/1011519
2015-05-23 03:12:40.690571 7f74b7f32700 10 mon.monitor01@0(leader) e6 setting timeout on session
2015-05-23 03:12:40.690590 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.690593 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.690595 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0x252ecec0 next 190533 (onetime)
2015-05-23 03:12:40.690615 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.690618 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.690620 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0x9426b40 next 190533 (onetime)
2015-05-23 03:12:40.690637 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.690640 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.690642 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0x359d8100 next 190533 (onetime)
2015-05-23 03:12:40.690659 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.690662 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.690663 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0xfa99240 next 190533 (onetime)
2015-05-23 03:12:40.690681 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.690683 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.690685 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0x1ac933c0 next 190533 (onetime)
2015-05-23 03:12:40.690701 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.690704 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.690705 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0xa538000 next 190533 (onetime)
2015-05-23 03:12:40.690724 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.690727 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.690729 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0x1ac93140 next 190533 (onetime)
2015-05-23 03:12:40.690746 7f74b7f32700 10 mon.monitor01@0(leader) e6 handle_subscribe mon_subscribe({monmap=7+,osdmap=190533}) v2
2015-05-23 03:12:40.690749 7f74b7f32700 10 mon.monitor01@0(leader) e6 check_sub monmap next 7 have 6
2015-05-23 03:12:40.690750 7f74b7f32700 10 mon.monitor01@0(leader).osd e190532 check_sub 0x252ecec0 next 190533 (onetime)


any clue on why i am facing this issue?
any help is appreciated.
thanks




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux