Re: lease_timeout

Karun Josy <karunjosy1@xxxxxxxxx> · Tue, 30 Jan 2018 01:53:39 +0530

Thank you for looking into it.
Yes, I believe it is the same issue as reported in the bug.

 Sorry I was not specific.
- Health section is not updated
- The Activity values under Pools section (right side) gets stuck it shows the old data and it is not updated. 

However, the Cluster log section gets updated correctly.

Karun Josy

On Tue, Jan 30, 2018 at 1:35 AM, John Spray <jspray@xxxxxxxxxx> wrote:
On Mon, Jan 29, 2018 at 6:58 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

> The lease timeout means this (peon) monitor hasn't heard from the leader

> monitor in too long; its read lease on the system state has expired. So it

> calls a new election since that means the leader is down or misbehaving. Do

> the other monitors have a similar problem at this stage?

>

> The manager freezing until you restart it is a separate bug, but I'm not

> sure what the dashboard/mgr people will want to see there. John?

There is a bug where the mgr will stop getting updates from the mon in

some situations (http://tracker.ceph.com/issues/22142), which is fixed

in master but not backported to luminous yet.

However, I don't know what "gets stuck" means in this context.  Karun,

can you be more specific?  Is it rendering but old data?  Is the page

not loading at all?

John

> -Greg

>

> On Sun, Jan 28, 2018 at 9:11 AM Karun Josy <karunjosy1@xxxxxxxxx> wrote:

>>

>> Still the issue is continuing. Any one else has noticed it ?

>>

>>

>> When this happens, the Ceph Dashboard GUI gets stuck and we have to

>> restart the manager daemon to make it work again

>>

>> Karun Josy

>>

>> On Wed, Jan 17, 2018 at 6:16 AM, Karun Josy <karunjosy1@xxxxxxxxx> wrote:

>>>

>>> Hello,

>>>

>>> In one of our cluster set up, there is frequent monitor elections

>>> happening.

>>> In the logs of one of the monitor, there is "lease_timeout" message

>>> before that happens. Can anyone help me to figure it out ?

>>> (When this happens, the Ceph Dashboard GUI gets stuck and we have to

>>> restart the manager daemon to make it work again)

>>>

>>> Ceph version : Luminous 12.2.2

>>>

>>> Log :

>>> =========================

>>>

>>> 2018-01-16 16:33:08.001937 7f0cfbaad700  4 rocksdb:

>>> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/compaction_job.cc:1173]

>>> [default] [JOB 885] Compacted 1@0 + 1@1 files to L1 => 20046585 bytes

>>> 2018-01-16 16:33:08.015891 7f0cfbaad700  4 rocksdb: (Original Log Time

>>> 2018/01/16-16:33:08.015826)

>>> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/compaction_job.cc:621]

>>> [default] compacted to: base level 1 max bytes base 268435456 files[0 1 0 0

>>> 0 0 0] max score 0.07, MB/sec: 32.7 rd, 30.9 wr, level 1, files in(1, 1)

>>> out(1) MB in(1.3, 18.9) out(19.1), read-write-amplify(31.0)

>>> write-amplify(15.1) OK, records in: 4305, records dropped: 515

>>>

>>> 2018-01-16 16:33:08.015897 7f0cfbaad700  4 rocksdb: (Original Log Time

>>> 2018/01/16-16:33:08.015840) EVENT_LOG_v1 {"time_micros": 1516149188015833,

>>> "job": 885, "event": "compaction_finished", "compaction_time_micros":

>>> 647876, "output_level": 1, "num_output_files": 1, "total_output_size":

>>> 20046585, "num_input_records": 4305, "num_output_records": 3790,

>>> "num_subcompactions": 1, "num_single_delete_mismatches": 0,

>>> "num_single_delete_fallthrough": 0, "lsm_state": [0, 1, 0, 0, 0, 0, 0]}

>>> 2018-01-16 16:33:08.016131 7f0cfbaad700  4 rocksdb: EVENT_LOG_v1

>>> {"time_micros": 1516149188016128, "job": 885, "event":

>>> "table_file_deletion", "file_number": 2419}

>>> 2018-01-16 16:33:08.018147 7f0cfbaad700  4 rocksdb: EVENT_LOG_v1

>>> {"time_micros": 1516149188018146, "job": 885, "event":

>>> "table_file_deletion", "file_number": 2417}

>>> 2018-01-16 16:33:11.051010 7f0d042be700  0

>>> mon.ceph-mon3@2(peon).data_health(436) update_stats avail 84% total 20918

>>> MB, used 2179 MB, avail 17653 MB

>>> 2018-01-16 16:33:17.269954 7f0d042be700  1

>>> mon.ceph-mon3@2(peon).paxos(paxos active c 84337..84838) lease_timeout --

>>> calling new election

>>> 2018-01-16 16:33:17.291096 7f0d01ab9700  0 log_channel(cluster) log [INF]

>>> : mon.ceph-sgp-mon3 calling new monitor election

>>> 2018-01-16 16:33:17.291182 7f0d01ab9700  1

>>> mon.ceph-mon3@2(electing).elector(436) init, last seen epoch 436

>>> 2018-01-16 16:33:20.834853 7f0d01ab9700  1 mon.ceph-mon3@2(peon).log

>>> v23189 check_sub sending message to client.65755 10.255.0.95:0/2603001850

>>> with 8 entries (version 23189)

>>>

>>>

>>>

>>> Karun

>>

>>

>> _______________________________________________

>> ceph-users mailing list

>> ceph-users@xxxxxxxxxxxxxx

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com