Re: ceph-mon logging like crazy because....?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Another follow-up.
The whole madness started with “mon compact” which we run from cron (else leveldb eats all space). It’s been running for about 14 days now with no incident.

2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 handle_command mon_command({"prefix": "compact"} v 0) v1
2015-06-02 16:40:23.646514 7f430a746700  1 mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) lease_timeout -- calling new election
2015-06-02 16:40:23.646947 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646953 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646960 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646963 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646968 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646971 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646976 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646979 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.000000 has v0 lc 3659
7321


The sequence that follows is
probing recovering
electing recovering
peon recovering
peon active (and this is the madness)

It logs much less now, but the issue is still here…

Jan

> On 02 Jun 2015, at 20:17, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> 
> Actually looks like it stopped, but here’s a more representative sample
> (notice how often it logged this!)
> 
> v0 lc 36602135
> 2015-06-02 17:39:59.865833 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865886 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865944 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865989 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866025 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866072 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866121 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866123 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866164 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866166 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866205 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866207 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866244 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866246 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866285 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866287 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866325 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866327 lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 
> 
> 
>> On 02 Jun 2015, at 20:14, Jan Schermer <jan@xxxxxxxxxxx> wrote:
>> 
>> Our mons just went into a logging frenzy.
>> 
>> We have 3 mons in the cluster, and they mostly log stuff like this
>> 
>> 2015-06-02 18:00:48.749386 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:48.749389 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
>> 2015-06-02 18:00:49.025179 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025187 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
>> 2015-06-02 18:00:49.025640 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025642 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
>> 2015-06-02 18:00:49.026132 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.026134 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
>> 2015-06-02 18:00:49.028388 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.028393 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
>> 
>> 
>> There are few lines every second, sometimes more, sometimes less (tell me if that’s normal. I’m not sure)
>> 
>> Two of them went completely haywire, one log is 17GB now and rising. It’s still mostly the same content, just more frequent:
>> 
>> 2015-06-02 18:09:00.879950 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879956 lease_expire=0.000000 has v0 lc 36604772
>> 2015-06-02 18:09:00.879968 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879969 lease_expire=0.000000 has v0 lc 36604772
>> 2015-06-02 18:09:00.954835 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954843 lease_expire=0.000000 has v0 lc 36604772
>> 2015-06-02 18:09:00.954860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954861 lease_expire=0.000000 has v0 lc 36604772
>> 2015-06-02 18:09:01.249648 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249668 lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
>> 2015-06-02 18:09:01.249697 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249699 lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
>> 2015-06-02 18:09:01.249708 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249709 lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
>> 2015-06-02 18:09:01.249736 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249736 lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
>> 
>> 
>> Any idea what it might be? Clocks look synced, no other aparent problem that I can see, the cluster is working.
>> I’d like to know why this happened before I restart the unhealthy mons which (I hope) will fix this.
>> 
>> Thanks
>> 
>> Jan
>> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux