most of the reason cause Mon down is big log i have set debug paxos = 0, and i am watching now. before set this, # ceph daemon mon.ceph01-vm config get debug_mon { "debug_mon": "1\/5"} # ceph daemon mon.ceph01-vm config get debug_ms { "debug_ms": "0\/5"} # ceph daemon mon.ceph01-vm config get debug_paxos { "debug_paxos": "1\/5"} after set this # ceph daemon mon.ceph01-vm config get debug_mon { "debug_mon": "1\/5"} # ceph daemon mon.ceph01-vm config get debug_ms { "debug_ms": "0\/5"} # ceph daemon mon.ceph01-vm config get debug_paxos { "debug_paxos": "0\/0"} 2014-08-24 18:58 GMT+07:00 Joao Eduardo Luis <joao.luis at inktank.com>: > On 08/24/2014 01:57 AM, debian Only wrote: > >> this is happen i use *ceph-deploy create ceph01-vm ceph02-vm ceph04-vm >> *to create 3 Mons member. >> >> now every 10 hours, one Mon will down. every time have this error, >> some time the hardisk have enough space left,such as 30G. >> >> i deployed Ceph before, only create one Mon at first step *ceph-deploy >> create ceph01-vm , and then ceph-deploy mon add ceph02-vm, *not meet >> >> this problem. >> >> i do not know why ? >> > > Your monitor shutdown because the disk the monitor is sitting on has > dropped to (or below) 5% of available disk space. This is meant to prevent > the monitor from running out of disk space and be unable to store critical > cluster information. 5% is a rough estimate, which may be adequate for > some disks, but may be either too small or too large for small disks and > large disks respectively. This value can be adjusted if you feel like you > need to, using the 'mon_data_avail_crit' option (which defaults to 5, as in > 5%, but can be adjusted to whatever suits you best). > > The big problem here however seems to be that you're running out of space > due to huge monitor logs. Is that it? > > If so, I would ask you to run the following commands and share the results: > > ceph daemon mon.* config get debug_mon > ceph daemon mon.* config get debug_ms > ceph daemon mon.* config get debug_paxos > > -Joao > > >> 2014-08-23 10:19:43.910650 7f3c0028c700 0 >> mon.ceph01-vm at 1(peon).data_health(56) *update_stats avail 5% total >> 15798272 used 12941508 avail 926268* >> >> 2014-08-23 10:19:43.910806 7f3c0028c700 -1 >> mon.ceph01-vm at 1(peon).data_health(56) reached critical levels of >> available space on local monitor storage -- shutdown! >> 2014-08-23 10:19:43.910811 7f3c0028c700 0 ** Shutdown via Data Health >> Service ** >> 2014-08-23 10:19:43.931427 7f3bffa8b700 1 >> mon.ceph01-vm at 1(peon).paxos(paxos active c 15814..16493) is_readable >> now=2014-08-23 10:19:43.931433 lease_expire=2014-08-23 10:19:45.989585 >> has v0 lc 16493 >> 2014-08-23 10:19:43.931486 7f3bfe887700 -1 mon.ceph01-vm at 1(peon) e2 *** >> Got Signal Interrupt *** >> 2014-08-23 10:19:43.931515 7f3bfe887700 1 mon.ceph01-vm at 1(peon) e2 >> shutdown >> 2014-08-23 10:19:43.931725 7f3bfe887700 0 quorum service shutdown >> 2014-08-23 10:19:43.931730 7f3bfe887700 0 >> mon.ceph01-vm at 1(shutdown).health(56) HealthMonitor::service_shutdown 1 >> services >> 2014-08-23 10:19:43.931735 7f3bfe887700 0 quorum service shutdown >> >> >> >> 2014-08-22 21:31 GMT+07:00 debian Only <onlydebian at gmail.com >> <mailto:onlydebian at gmail.com>>: >> >> >> this time ceph01-vm down, no big log happen , other 2 ok. do not >> what's the reason, this is not my first time install Ceph. but >> this is first time i meet that mon down again and again. >> >> ceph.conf on each OSDs and MONs >> [global] >> fsid = 075f1aae-48de-412e-b024-b0f014dbc8cf >> mon_initial_members = ceph01-vm, ceph02-vm, ceph04-vm >> mon_host = 192.168.123.251,192.168.123.252,192.168.123.250 >> auth_cluster_required = cephx >> auth_service_required = cephx >> auth_client_required = cephx >> filestore_xattr_use_omap = true >> >> rgw print continue = false >> rgw dns name = ceph-radosgw >> osd pool default pg num = 128 >> osd pool default pgp num = 128 >> >> >> [client.radosgw.gateway] >> host = ceph-radosgw >> keyring = /etc/ceph/ceph.client.radosgw.keyring >> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock >> log file = /var/log/ceph/client.radosgw.gateway.log >> >> >> 2014-08-22 18:15 GMT+07:00 Joao Eduardo Luis <joao.luis at inktank.com >> <mailto:joao.luis at inktank.com>>: >> >> >> On 08/22/2014 10:21 AM, debian Only wrote: >> >> i have 3 mons in Ceph 0.80.5 on Wheezy. have one RadosGW >> >> when happen this first time, i increase the mon log device. >> this time mon.ceph02-vm down, only this mon down, other 2 >> is ok. >> >> pls some one give me some guide. >> >> 27M Aug 22 02:11 ceph-mon.ceph04-vm.log >> 43G Aug 22 02:11 ceph-mon.ceph02-vm.log >> 2G Aug 22 02:11 ceph-mon.ceph01-vm.log >> >> >> Depending on the debug level you set, and depending on which >> subsystems you set a higher debug level, the monitor can spit >> out A LOT of information in a short period of time. 43GB is >> nothing compared to some 100+ GB logs I've had churn through in >> the past. >> >> However, I'm not grasping what kind of help you need. According >> to your 'ceph -s' below the monitors seem okay -- all are in, >> health is OK. >> >> If you issue is with having that one monitor spitting out >> humongous amounts of debug info here's what you need to do: >> >> - If you added one or more 'debug <something> = X' to that >> monitor's ceph.conf, you will want to remove them so that in a >> future restart the monitor doesn't start with non-default debug >> levels. >> >> - You will want to inject default debug levels into that one >> monitor. >> >> Depending on what debug levels you have increased, you will want >> to run a version of "ceph tell mon.ceph02-vm injectargs >> '--debug-mon 1/5 --debug-ms 0/5 --debug-paxos 1/5'" >> >> -Joao >> >> >> # ceph -s >> cluster 075f1aae-48de-412e-b024-__b0f014dbc8cf >> >> health HEALTH_OK >> monmap e2: 3 mons at >> {ceph01-vm=192.168.123.251:__6789/0,ceph02-vm=192.168.123._ >> _252:6789/0,ceph04-vm=192.168.__123.250:6789/0 >> >> <http://192.168.123.251:6789/0,ceph02-vm=192.168.123.252: >> 6789/0,ceph04-vm=192.168.123.250:6789/0> >> <http://192.168.123.251:6789/__0,ceph02-vm=192.168.123.252:_ >> _6789/0,ceph04-vm=192.168.123.__250:6789/0 >> >> <http://192.168.123.251:6789/0,ceph02-vm=192.168.123.252: >> 6789/0,ceph04-vm=192.168.123.250:6789/0>>}, >> >> election epoch 44, quorum 0,1,2 ceph04-vm,ceph01-vm,ceph02-vm >> mdsmap e10: 1/1/1 up {0=ceph06-vm=up:active} >> osdmap e145: 10 osds: 10 up, 10 in >> pgmap v4394: 2392 pgs, 21 pools, 4503 MB data, 1250 >> objects >> 13657 MB used, 4908 GB / 4930 GB avail >> 2392 active+clean >> >> >> /2014-08-22 02:06:34.738828 7ff2b9557700 1 >> >> mon.ceph02-vm at 2(peon).paxos(__paxos active c 9037..9756) >> >> is_readable >> now=2014-08-22 02:06:34.738830 lease_expire=2014-08-22 >> 02:06:39.701305 >> has v0 lc 9756/ >> /2014-08-22 02:06:36.618805 7ff2b9557700 1 >> >> mon.ceph02-vm at 2(peon).paxos(__paxos active c 9037..9756) >> >> is_readable >> now=2014-08-22 02:06:36.618807 lease_expire=2014-08-22 >> 02:06:39.701305 >> has v0 lc 9756/ >> /2014-08-22 02:06:36.620019 7ff2b9557700 1 >> >> mon.ceph02-vm at 2(peon).paxos(__paxos active c 9037..9756) >> >> is_readable >> now=2014-08-22 02:06:36.620021 lease_expire=2014-08-22 >> 02:06:39.701305 >> has v0 lc 9756/ >> /2014-08-22 02:06:36.620975 7ff2b9557700 1 >> >> mon.ceph02-vm at 2(peon).paxos(__paxos active c 9037..9756) >> >> is_readable >> now=2014-08-22 02:06:36.620977 lease_expire=2014-08-22 >> 02:06:39.701305 >> has v0 lc 9756/ >> /2014-08-22 02:06:36.629362 7ff2b9557700 0 >> mon.ceph02-vm at 2(peon) e2 >> >> handle_command mon_command({"prefix": "mon_status", >> "format": "json"} v >> 0) v1/ >> /2014-08-22 02:06:36.633007 7ff2b9557700 0 >> mon.ceph02-vm at 2(peon) e2 >> handle_command mon_command({"prefix": "status", "format": >> "json"} v 0) v1/ >> /2014-08-22 02:06:36.637002 7ff2b9557700 0 >> mon.ceph02-vm at 2(peon) e2 >> >> handle_command mon_command({"prefix": "health", "detail": >> "", "format": >> "json"} v 0) v1/ >> /2014-08-22 02:06:36.640971 7ff2b9557700 0 >> mon.ceph02-vm at 2(peon) e2 >> >> handle_command mon_command({"dumpcontents": ["pgs_brief"], >> "prefix": "pg >> dump", "format": "json"} v 0) v1/ >> /2014-08-22 02:06:36.641014 7ff2b9557700 1 >> >> mon.ceph02-vm at 2(peon).paxos(__paxos active c 9037..9756) >> >> is_readable >> now=2014-08-22 02:06:36.641016 lease_expire=2014-08-22 >> 02:06:39.701305 >> has v0 lc 9756/ >> /2014-08-22 02:06:37.520387 7ff2b9557700 1 >> >> mon.ceph02-vm at 2(peon).paxos(__paxos active c 9037..9757) >> >> is_readable >> now=2014-08-22 02:06:37.520388 lease_expire=2014-08-22 >> 02:06:42.501572 >> has v0 lc 9757/ >> >> >> >> _________________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com> >> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com >> >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> >> >> >> -- >> Joao Eduardo Luis >> Software Engineer | http://inktank.com | http://ceph.com >> >> >> >> > > -- > Joao Eduardo Luis > Software Engineer | http://inktank.com | http://ceph.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140824/c9a2e0c4/attachment.htm>