The lease timeout means this (peon) monitor hasn't heard from the leader monitor in too long; its read lease on the system state has expired. So it calls a new election since that means the leader is down or misbehaving. Do the other monitors have a similar problem at this stage?
The manager freezing until you restart it is a separate bug, but I'm not sure what the dashboard/mgr people will want to see there. John?
-Greg
On Sun, Jan 28, 2018 at 9:11 AM Karun Josy <karunjosy1@xxxxxxxxx> wrote:
Still the issue is continuing. Any one else has noticed it ?
When this happens, the Ceph Dashboard GUI gets stuck and we have to restart the manager daemon to make it work againKarun Josy_______________________________________________On Wed, Jan 17, 2018 at 6:16 AM, Karun Josy <karunjosy1@xxxxxxxxx> wrote:Hello,In one of our cluster set up, there is frequent monitor elections happening.In the logs of one of the monitor, there is "lease_timeout" message before that happens. Can anyone help me to figure it out ?
(When this happens, the Ceph Dashboard GUI gets stuck and we have to restart the manager daemon to make it work again)Ceph version : Luminous 12.2.2Log :=========================2018-01-16 16:33:08.001937 7f0cfbaad700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/compaction_job.cc:1173] [default] [JOB 885] Compacted 1@0 + 1@1 files to L1 => 20046585 bytes2018-01-16 16:33:08.015891 7f0cfbaad700 4 rocksdb: (Original Log Time 2018/01/16-16:33:08.015826) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/compaction_job.cc:621] [default] compacted to: base level 1 max bytes base 268435456 files[0 1 0 0 0 0 0] max score 0.07, MB/sec: 32.7 rd, 30.9 wr, level 1, files in(1, 1) out(1) MB in(1.3, 18.9) out(19.1), read-write-amplify(31.0) write-amplify(15.1) OK, records in: 4305, records dropped: 5152018-01-16 16:33:08.015897 7f0cfbaad700 4 rocksdb: (Original Log Time 2018/01/16-16:33:08.015840) EVENT_LOG_v1 {"time_micros": 1516149188015833, "job": 885, "event": "compaction_finished", "compaction_time_micros": 647876, "output_level": 1, "num_output_files": 1, "total_output_size": 20046585, "num_input_records": 4305, "num_output_records": 3790, "num_subcompactions": 1, "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 1, 0, 0, 0, 0, 0]}2018-01-16 16:33:08.016131 7f0cfbaad700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1516149188016128, "job": 885, "event": "table_file_deletion", "file_number": 2419}2018-01-16 16:33:08.018147 7f0cfbaad700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1516149188018146, "job": 885, "event": "table_file_deletion", "file_number": 2417}2018-01-16 16:33:11.051010 7f0d042be700 0 mon.ceph-mon3@2(peon).data_health(436) update_stats avail 84% total 20918 MB, used 2179 MB, avail 17653 MB2018-01-16 16:33:17.269954 7f0d042be700 1 mon.ceph-mon3@2(peon).paxos(paxos active c 84337..84838) lease_timeout -- calling new election2018-01-16 16:33:17.291096 7f0d01ab9700 0 log_channel(cluster) log [INF] : mon.ceph-sgp-mon3 calling new monitor election2018-01-16 16:33:17.291182 7f0d01ab9700 1 mon.ceph-mon3@2(electing).elector(436) init, last seen epoch 4362018-01-16 16:33:20.834853 7f0d01ab9700 1 mon.ceph-mon3@2(peon).log v23189 check_sub sending message to client.65755 10.255.0.95:0/2603001850 with 8 entries (version 23189)Karun
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com