Re: ceph-mon logging like crazy because....?

Somnath Roy <Somnath.Roy@xxxxxxxxxxx> · Tue, 2 Jun 2015 18:36:57 +0000

I think with the latest version of code it is printing only for log level 5, earlier it was 1. Here is the link where I had some conversation about this earlier with Sage.

http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881

So, IMO nothing to worry about other than log spam here which is fixed in the latest build or you can fix it with debug mon = 0/0

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Jan Schermer
Sent: Tuesday, June 02, 2015 11:33 AM
To: ceph-users
Subject: Re:  ceph-mon logging like crazy because....?

Another follow-up.
The whole madness started with “mon compact” which we run from cron (else leveldb eats all space). It’s been running for about 14 days now with no incident.

2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 handle_command mon_command({"prefix": "compact"} v 0) v1
2015-06-02 16:40:23.646514 7f430a746700  1 mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) lease_timeout -- calling new election
2015-06-02 16:40:23.646947 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646953 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646960 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646963 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646968 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646971 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646976 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646979 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.000000 has v0 lc 3659
7321

The sequence that follows is
probing recovering
electing recovering
peon recovering
peon active (and this is the madness)

It logs much less now, but the issue is still here…

Jan

> On 02 Jun 2015, at 20:17, Jan Schermer <jan@xxxxxxxxxxx> wrote:
>
> Actually looks like it stopped, but here’s a more representative
> sample (notice how often it logged this!)
>
> v0 lc 36602135
> 2015-06-02 17:39:59.865833 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865860 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865886 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865944 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865989 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866025 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866072 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866121 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.866123 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866164 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.866166 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866205 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.866207 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866244 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.866246 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866285 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.866287 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866325 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.866327 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
>
>
>
>> On 02 Jun 2015, at 20:14, Jan Schermer <jan@xxxxxxxxxxx> wrote:
>>
>> Our mons just went into a logging frenzy.
>>
>> We have 3 mons in the cluster, and they mostly log stuff like this
>>
>> 2015-06-02 18:00:48.749386 7f1c08c0d700  1
>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063)
>> is_readable now=2015-06-02 18:00:48.749389 lease_expire=2015-06-02
>> 18:00:53.507837 has v0 lc 36604063
>> 2015-06-02 18:00:49.025179 7f1c08c0d700  1
>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063)
>> is_readable now=2015-06-02 18:00:49.025187 lease_expire=2015-06-02
>> 18:00:53.507837 has v0 lc 36604063
>> 2015-06-02 18:00:49.025640 7f1c08c0d700  1
>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063)
>> is_readable now=2015-06-02 18:00:49.025642 lease_expire=2015-06-02
>> 18:00:53.507837 has v0 lc 36604063
>> 2015-06-02 18:00:49.026132 7f1c08c0d700  1
>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063)
>> is_readable now=2015-06-02 18:00:49.026134 lease_expire=2015-06-02
>> 18:00:53.507837 has v0 lc 36604063
>> 2015-06-02 18:00:49.028388 7f1c08c0d700  1
>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063)
>> is_readable now=2015-06-02 18:00:49.028393 lease_expire=2015-06-02
>> 18:00:53.507837 has v0 lc 36604063
>>
>>
>> There are few lines every second, sometimes more, sometimes less
>> (tell me if that’s normal. I’m not sure)
>>
>> Two of them went completely haywire, one log is 17GB now and rising. It’s still mostly the same content, just more frequent:
>>
>> 2015-06-02 18:09:00.879950 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772)
>> is_readable now=2015-06-02 18:09:00.879956 lease_expire=0.000000 has
>> v0 lc 36604772
>> 2015-06-02 18:09:00.879968 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772)
>> is_readable now=2015-06-02 18:09:00.879969 lease_expire=0.000000 has
>> v0 lc 36604772
>> 2015-06-02 18:09:00.954835 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772)
>> is_readable now=2015-06-02 18:09:00.954843 lease_expire=0.000000 has
>> v0 lc 36604772
>> 2015-06-02 18:09:00.954860 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772)
>> is_readable now=2015-06-02 18:09:00.954861 lease_expire=0.000000 has
>> v0 lc 36604772
>> 2015-06-02 18:09:01.249648 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773)
>> is_readable now=2015-06-02 18:09:01.249668 lease_expire=2015-06-02
>> 18:09:06.091738 has v0 lc 36604773
>> 2015-06-02 18:09:01.249697 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773)
>> is_readable now=2015-06-02 18:09:01.249699 lease_expire=2015-06-02
>> 18:09:06.091738 has v0 lc 36604773
>> 2015-06-02 18:09:01.249708 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773)
>> is_readable now=2015-06-02 18:09:01.249709 lease_expire=2015-06-02
>> 18:09:06.091738 has v0 lc 36604773
>> 2015-06-02 18:09:01.249736 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773)
>> is_readable now=2015-06-02 18:09:01.249736 lease_expire=2015-06-02
>> 18:09:06.091738 has v0 lc 36604773
>>
>>
>> Any idea what it might be? Clocks look synced, no other aparent problem that I can see, the cluster is working.
>> I’d like to know why this happened before I restart the unhealthy mons which (I hope) will fix this.
>>
>> Thanks
>>
>> Jan
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com