One Mon log huge and this Mon down often

onlydebian@xxxxxxxxx (debian Only) · Sun, 24 Aug 2014 07:57:34 +0700

this is happen i use  *ceph-deploy create ceph01-vm ceph02-vm ceph04-vm  *to
create 3 Mons member.
now every 10 hours, one  Mon will down.   every time have this error,  some
time the hardisk have enough space left,such as 30G.

i deployed Ceph before,  only create one Mon at first step  *ceph-deploy
create ceph01-vm ,  and then ceph-deploy mon add ceph02-vm, *not meet this
problem.

i do not know why ?

2014-08-23 10:19:43.910650 7f3c0028c700  0
mon.ceph01-vm at 1(peon).data_health(56)
*update_stats avail 5% total 15798272 used 12941508 avail 926268*
2014-08-23 10:19:43.910806 7f3c0028c700 -1
mon.ceph01-vm at 1(peon).data_health(56)
reached critical levels of available space on local monitor storage --
shutdown!
2014-08-23 10:19:43.910811 7f3c0028c700  0 ** Shutdown via Data Health
Service **
2014-08-23 10:19:43.931427 7f3bffa8b700  1 mon.ceph01-vm at 1(peon).paxos(paxos
active c 15814..16493) is_readable now=2014-08-23 10:19:43.931433
lease_expire=2014-08-23 10:19:45.989585 has v0 lc 16493
2014-08-23 10:19:43.931486 7f3bfe887700 -1 mon.ceph01-vm at 1(peon) e2 *** Got
Signal Interrupt ***
2014-08-23 10:19:43.931515 7f3bfe887700  1 mon.ceph01-vm at 1(peon) e2 shutdown
2014-08-23 10:19:43.931725 7f3bfe887700  0 quorum service shutdown
2014-08-23 10:19:43.931730 7f3bfe887700  0 mon.ceph01-vm at 1(shutdown).health(56)
HealthMonitor::service_shutdown 1 services
2014-08-23 10:19:43.931735 7f3bfe887700  0 quorum service shutdown

2014-08-22 21:31 GMT+07:00 debian Only <onlydebian at gmail.com>:

> this time ceph01-vm down, no big log happen ,  other 2 ok.    do not
> what's the reason,  this is not my first time install Ceph.  but this is
> first time i meet that mon down again and again.
>
> ceph.conf on each OSDs and MONs
>  [global]
> fsid = 075f1aae-48de-412e-b024-b0f014dbc8cf
> mon_initial_members = ceph01-vm, ceph02-vm, ceph04-vm
> mon_host = 192.168.123.251,192.168.123.252,192.168.123.250
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
>
> rgw print continue = false
> rgw dns name = ceph-radosgw
> osd pool default pg num = 128
> osd pool default pgp num = 128
>
>
> [client.radosgw.gateway]
> host = ceph-radosgw
> keyring = /etc/ceph/ceph.client.radosgw.keyring
> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> log file = /var/log/ceph/client.radosgw.gateway.log
>
>
> 2014-08-22 18:15 GMT+07:00 Joao Eduardo Luis <joao.luis at inktank.com>:
>
> On 08/22/2014 10:21 AM, debian Only wrote:
>>
>>> i have  3 mons in Ceph 0.80.5 on Wheezy. have one RadosGW
>>>
>>> when happen this first time, i increase the mon log device.
>>> this time mon.ceph02-vm down, only this mon down,  other 2 is ok.
>>>
>>> pls some one give me some guide.
>>>
>>>   27M Aug 22 02:11 ceph-mon.ceph04-vm.log
>>>   43G Aug 22 02:11 ceph-mon.ceph02-vm.log
>>>   2G Aug 22 02:11 ceph-mon.ceph01-vm.log
>>>
>>
>> Depending on the debug level you set, and depending on which subsystems
>> you set a higher debug level, the monitor can spit out A LOT of information
>> in a short period of time.  43GB is nothing compared to some 100+ GB logs
>> I've had churn through in the past.
>>
>> However, I'm not grasping what kind of help you need.  According to your
>> 'ceph -s' below the monitors seem okay -- all are in, health is OK.
>>
>> If you issue is with having that one monitor spitting out humongous
>> amounts of debug info here's what you need to do:
>>
>> - If you added one or more 'debug <something> = X' to that monitor's
>> ceph.conf, you will want to remove them so that in a future restart the
>> monitor doesn't start with non-default debug levels.
>>
>> - You will want to inject default debug levels into that one monitor.
>>
>> Depending on what debug levels you have increased, you will want to run a
>> version of "ceph tell mon.ceph02-vm injectargs '--debug-mon 1/5 --debug-ms
>> 0/5 --debug-paxos 1/5'"
>>
>>   -Joao
>>
>>
>>> # ceph -s
>>>      cluster 075f1aae-48de-412e-b024-b0f014dbc8cf
>>>       health HEALTH_OK
>>>       monmap e2: 3 mons at
>>> {ceph01-vm=192.168.123.251:6789/0,ceph02-vm=192.168.123.
>>> 252:6789/0,ceph04-vm=192.168.123.250:6789/0
>>>  <http://192.168.123.251:6789/0,ceph02-vm=192.168.123.252:
>>> 6789/0,ceph04-vm=192.168.123.250:6789/0>},
>>>
>>> election epoch 44, quorum 0,1,2 ceph04-vm,ceph01-vm,ceph02-vm
>>>       mdsmap e10: 1/1/1 up {0=ceph06-vm=up:active}
>>>       osdmap e145: 10 osds: 10 up, 10 in
>>>        pgmap v4394: 2392 pgs, 21 pools, 4503 MB data, 1250 objects
>>>              13657 MB used, 4908 GB / 4930 GB avail
>>>                  2392 active+clean
>>>
>>>
>>> /2014-08-22 02:06:34.738828 7ff2b9557700  1
>>>
>>> mon.ceph02-vm at 2(peon).paxos(paxos active c 9037..9756) is_readable
>>> now=2014-08-22 02:06:34.738830 lease_expire=2014-08-22 02:06:39.701305
>>> has v0 lc 9756/
>>> /2014-08-22 02:06:36.618805 7ff2b9557700  1
>>>
>>> mon.ceph02-vm at 2(peon).paxos(paxos active c 9037..9756) is_readable
>>> now=2014-08-22 02:06:36.618807 lease_expire=2014-08-22 02:06:39.701305
>>> has v0 lc 9756/
>>> /2014-08-22 02:06:36.620019 7ff2b9557700  1
>>>
>>> mon.ceph02-vm at 2(peon).paxos(paxos active c 9037..9756) is_readable
>>> now=2014-08-22 02:06:36.620021 lease_expire=2014-08-22 02:06:39.701305
>>> has v0 lc 9756/
>>> /2014-08-22 02:06:36.620975 7ff2b9557700  1
>>>
>>> mon.ceph02-vm at 2(peon).paxos(paxos active c 9037..9756) is_readable
>>> now=2014-08-22 02:06:36.620977 lease_expire=2014-08-22 02:06:39.701305
>>> has v0 lc 9756/
>>> /2014-08-22 02:06:36.629362 7ff2b9557700  0 mon.ceph02-vm at 2(peon) e2
>>>
>>> handle_command mon_command({"prefix": "mon_status", "format": "json"} v
>>> 0) v1/
>>> /2014-08-22 02:06:36.633007 7ff2b9557700  0 mon.ceph02-vm at 2(peon) e2
>>> handle_command mon_command({"prefix": "status", "format": "json"} v 0)
>>> v1/
>>> /2014-08-22 02:06:36.637002 7ff2b9557700  0 mon.ceph02-vm at 2(peon) e2
>>>
>>> handle_command mon_command({"prefix": "health", "detail": "", "format":
>>> "json"} v 0) v1/
>>> /2014-08-22 02:06:36.640971 7ff2b9557700  0 mon.ceph02-vm at 2(peon) e2
>>>
>>> handle_command mon_command({"dumpcontents": ["pgs_brief"], "prefix": "pg
>>> dump", "format": "json"} v 0) v1/
>>> /2014-08-22 02:06:36.641014 7ff2b9557700  1
>>>
>>> mon.ceph02-vm at 2(peon).paxos(paxos active c 9037..9756) is_readable
>>> now=2014-08-22 02:06:36.641016 lease_expire=2014-08-22 02:06:39.701305
>>> has v0 lc 9756/
>>> /2014-08-22 02:06:37.520387 7ff2b9557700  1
>>>
>>> mon.ceph02-vm at 2(peon).paxos(paxos active c 9037..9757) is_readable
>>> now=2014-08-22 02:06:37.520388 lease_expire=2014-08-22 02:06:42.501572
>>> has v0 lc 9757/
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>> --
>> Joao Eduardo Luis
>> Software Engineer | http://inktank.com | http://ceph.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140824/26791b5c/attachment.htm>