One Mon log huge and this Mon down often

onlydebian@xxxxxxxxx (debian Only) · Sun, 24 Aug 2014 20:25:52 +0700

most of the reason cause Mon down is big log
i have set debug paxos = 0,  and i am watching now.

before set this,
# ceph daemon mon.ceph01-vm config get debug_mon
{ "debug_mon": "1\/5"}
# ceph daemon mon.ceph01-vm config get debug_ms
{ "debug_ms": "0\/5"}
# ceph daemon mon.ceph01-vm config get debug_paxos
{ "debug_paxos": "1\/5"}

after set this
# ceph daemon mon.ceph01-vm config get debug_mon
{ "debug_mon": "1\/5"}
# ceph daemon mon.ceph01-vm config get debug_ms
{ "debug_ms": "0\/5"}
# ceph daemon mon.ceph01-vm config get debug_paxos
{ "debug_paxos": "0\/0"}

2014-08-24 18:58 GMT+07:00 Joao Eduardo Luis <joao.luis at inktank.com>:

> On 08/24/2014 01:57 AM, debian Only wrote:
>
>> this is happen i use *ceph-deploy create ceph01-vm ceph02-vm ceph04-vm
>> *to create 3 Mons member.
>>
>> now every 10 hours, one  Mon will down.   every time have this error,
>>   some time the hardisk have enough space left,such as 30G.
>>
>> i deployed Ceph before,  only create one Mon at first step *ceph-deploy
>> create ceph01-vm ,  and then ceph-deploy mon add ceph02-vm, *not meet
>>
>> this problem.
>>
>> i do not know why ?
>>
>
> Your monitor shutdown because the disk the monitor is sitting on has
> dropped to (or below) 5% of available disk space.  This is meant to prevent
> the monitor from running out of disk space and be unable to store critical
> cluster information.  5% is a rough estimate, which may be adequate for
> some disks, but may be either too small or too large for small disks and
> large disks respectively.  This value can be adjusted if you feel like you
> need to, using the 'mon_data_avail_crit' option (which defaults to 5, as in
> 5%, but can be adjusted to whatever suits you best).
>
> The big problem here however seems to be that you're running out of space
> due to huge monitor logs. Is that it?
>
> If so, I would ask you to run the following commands and share the results:
>
> ceph daemon mon.* config get debug_mon
> ceph daemon mon.* config get debug_ms
> ceph daemon mon.* config get debug_paxos
>
>   -Joao
>
>
>> 2014-08-23 10:19:43.910650 7f3c0028c700  0
>> mon.ceph01-vm at 1(peon).data_health(56) *update_stats avail 5% total
>> 15798272 used 12941508 avail 926268*
>>
>> 2014-08-23 10:19:43.910806 7f3c0028c700 -1
>> mon.ceph01-vm at 1(peon).data_health(56) reached critical levels of
>> available space on local monitor storage -- shutdown!
>> 2014-08-23 10:19:43.910811 7f3c0028c700  0 ** Shutdown via Data Health
>> Service **
>> 2014-08-23 10:19:43.931427 7f3bffa8b700  1
>> mon.ceph01-vm at 1(peon).paxos(paxos active c 15814..16493) is_readable
>> now=2014-08-23 10:19:43.931433 lease_expire=2014-08-23 10:19:45.989585
>> has v0 lc 16493
>> 2014-08-23 10:19:43.931486 7f3bfe887700 -1 mon.ceph01-vm at 1(peon) e2 ***
>> Got Signal Interrupt ***
>> 2014-08-23 10:19:43.931515 7f3bfe887700  1 mon.ceph01-vm at 1(peon) e2
>> shutdown
>> 2014-08-23 10:19:43.931725 7f3bfe887700  0 quorum service shutdown
>> 2014-08-23 10:19:43.931730 7f3bfe887700  0
>> mon.ceph01-vm at 1(shutdown).health(56) HealthMonitor::service_shutdown 1
>> services
>> 2014-08-23 10:19:43.931735 7f3bfe887700  0 quorum service shutdown
>>
>>
>>
>> 2014-08-22 21:31 GMT+07:00 debian Only <onlydebian at gmail.com
>> <mailto:onlydebian at gmail.com>>:
>>
>>
>>     this time ceph01-vm down, no big log happen ,  other 2 ok.    do not
>>     what's the reason,  this is not my first time install Ceph.  but
>>     this is first time i meet that mon down again and again.
>>
>>     ceph.conf on each OSDs and MONs
>>       [global]
>>     fsid = 075f1aae-48de-412e-b024-b0f014dbc8cf
>>     mon_initial_members = ceph01-vm, ceph02-vm, ceph04-vm
>>     mon_host = 192.168.123.251,192.168.123.252,192.168.123.250
>>     auth_cluster_required = cephx
>>     auth_service_required = cephx
>>     auth_client_required = cephx
>>     filestore_xattr_use_omap = true
>>
>>     rgw print continue = false
>>     rgw dns name = ceph-radosgw
>>     osd pool default pg num = 128
>>     osd pool default pgp num = 128
>>
>>
>>     [client.radosgw.gateway]
>>     host = ceph-radosgw
>>     keyring = /etc/ceph/ceph.client.radosgw.keyring
>>     rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
>>     log file = /var/log/ceph/client.radosgw.gateway.log
>>
>>
>>     2014-08-22 18:15 GMT+07:00 Joao Eduardo Luis <joao.luis at inktank.com
>>     <mailto:joao.luis at inktank.com>>:
>>
>>
>>         On 08/22/2014 10:21 AM, debian Only wrote:
>>
>>             i have  3 mons in Ceph 0.80.5 on Wheezy. have one RadosGW
>>
>>             when happen this first time, i increase the mon log device.
>>             this time mon.ceph02-vm down, only this mon down,  other 2
>>             is ok.
>>
>>             pls some one give me some guide.
>>
>>                27M Aug 22 02:11 ceph-mon.ceph04-vm.log
>>                43G Aug 22 02:11 ceph-mon.ceph02-vm.log
>>                2G Aug 22 02:11 ceph-mon.ceph01-vm.log
>>
>>
>>         Depending on the debug level you set, and depending on which
>>         subsystems you set a higher debug level, the monitor can spit
>>         out A LOT of information in a short period of time.  43GB is
>>         nothing compared to some 100+ GB logs I've had churn through in
>>         the past.
>>
>>         However, I'm not grasping what kind of help you need.  According
>>         to your 'ceph -s' below the monitors seem okay -- all are in,
>>         health is OK.
>>
>>         If you issue is with having that one monitor spitting out
>>         humongous amounts of debug info here's what you need to do:
>>
>>         - If you added one or more 'debug <something> = X' to that
>>         monitor's ceph.conf, you will want to remove them so that in a
>>         future restart the monitor doesn't start with non-default debug
>>         levels.
>>
>>         - You will want to inject default debug levels into that one
>>         monitor.
>>
>>         Depending on what debug levels you have increased, you will want
>>         to run a version of "ceph tell mon.ceph02-vm injectargs
>>         '--debug-mon 1/5 --debug-ms 0/5 --debug-paxos 1/5'"
>>
>>            -Joao
>>
>>
>>             # ceph -s
>>                   cluster 075f1aae-48de-412e-b024-__b0f014dbc8cf
>>
>>                    health HEALTH_OK
>>                    monmap e2: 3 mons at
>>             {ceph01-vm=192.168.123.251:__6789/0,ceph02-vm=192.168.123._
>> _252:6789/0,ceph04-vm=192.168.__123.250:6789/0
>>
>>             <http://192.168.123.251:6789/0,ceph02-vm=192.168.123.252:
>> 6789/0,ceph04-vm=192.168.123.250:6789/0>
>>             <http://192.168.123.251:6789/__0,ceph02-vm=192.168.123.252:_
>> _6789/0,ceph04-vm=192.168.123.__250:6789/0
>>
>>             <http://192.168.123.251:6789/0,ceph02-vm=192.168.123.252:
>> 6789/0,ceph04-vm=192.168.123.250:6789/0>>},
>>
>>             election epoch 44, quorum 0,1,2 ceph04-vm,ceph01-vm,ceph02-vm
>>                    mdsmap e10: 1/1/1 up {0=ceph06-vm=up:active}
>>                    osdmap e145: 10 osds: 10 up, 10 in
>>                     pgmap v4394: 2392 pgs, 21 pools, 4503 MB data, 1250
>>             objects
>>                           13657 MB used, 4908 GB / 4930 GB avail
>>                               2392 active+clean
>>
>>
>>             /2014-08-22 02:06:34.738828 7ff2b9557700  1
>>
>>             mon.ceph02-vm at 2(peon).paxos(__paxos active c 9037..9756)
>>
>>             is_readable
>>             now=2014-08-22 02:06:34.738830 lease_expire=2014-08-22
>>             02:06:39.701305
>>             has v0 lc 9756/
>>             /2014-08-22 02:06:36.618805 7ff2b9557700  1
>>
>>             mon.ceph02-vm at 2(peon).paxos(__paxos active c 9037..9756)
>>
>>             is_readable
>>             now=2014-08-22 02:06:36.618807 lease_expire=2014-08-22
>>             02:06:39.701305
>>             has v0 lc 9756/
>>             /2014-08-22 02:06:36.620019 7ff2b9557700  1
>>
>>             mon.ceph02-vm at 2(peon).paxos(__paxos active c 9037..9756)
>>
>>             is_readable
>>             now=2014-08-22 02:06:36.620021 lease_expire=2014-08-22
>>             02:06:39.701305
>>             has v0 lc 9756/
>>             /2014-08-22 02:06:36.620975 7ff2b9557700  1
>>
>>             mon.ceph02-vm at 2(peon).paxos(__paxos active c 9037..9756)
>>
>>             is_readable
>>             now=2014-08-22 02:06:36.620977 lease_expire=2014-08-22
>>             02:06:39.701305
>>             has v0 lc 9756/
>>             /2014-08-22 02:06:36.629362 7ff2b9557700  0
>>             mon.ceph02-vm at 2(peon) e2
>>
>>             handle_command mon_command({"prefix": "mon_status",
>>             "format": "json"} v
>>             0) v1/
>>             /2014-08-22 02:06:36.633007 7ff2b9557700  0
>>             mon.ceph02-vm at 2(peon) e2
>>             handle_command mon_command({"prefix": "status", "format":
>>             "json"} v 0) v1/
>>             /2014-08-22 02:06:36.637002 7ff2b9557700  0
>>             mon.ceph02-vm at 2(peon) e2
>>
>>             handle_command mon_command({"prefix": "health", "detail":
>>             "", "format":
>>             "json"} v 0) v1/
>>             /2014-08-22 02:06:36.640971 7ff2b9557700  0
>>             mon.ceph02-vm at 2(peon) e2
>>
>>             handle_command mon_command({"dumpcontents": ["pgs_brief"],
>>             "prefix": "pg
>>             dump", "format": "json"} v 0) v1/
>>             /2014-08-22 02:06:36.641014 7ff2b9557700  1
>>
>>             mon.ceph02-vm at 2(peon).paxos(__paxos active c 9037..9756)
>>
>>             is_readable
>>             now=2014-08-22 02:06:36.641016 lease_expire=2014-08-22
>>             02:06:39.701305
>>             has v0 lc 9756/
>>             /2014-08-22 02:06:37.520387 7ff2b9557700  1
>>
>>             mon.ceph02-vm at 2(peon).paxos(__paxos active c 9037..9757)
>>
>>             is_readable
>>             now=2014-08-22 02:06:37.520388 lease_expire=2014-08-22
>>             02:06:42.501572
>>             has v0 lc 9757/
>>
>>
>>
>>             _________________________________________________
>>             ceph-users mailing list
>>             ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>>             http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>>
>>             <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>
>>
>>
>>         --
>>         Joao Eduardo Luis
>>         Software Engineer | http://inktank.com | http://ceph.com
>>
>>
>>
>>
>
> --
> Joao Eduardo Luis
> Software Engineer | http://inktank.com | http://ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140824/c9a2e0c4/attachment.htm>