this is happen i use *ceph-deploy create ceph01-vm ceph02-vm ceph04-vm *to create 3 Mons member. now every 10 hours, one Mon will down. every time have this error, some time the hardisk have enough space left,such as 30G. i deployed Ceph before, only create one Mon at first step *ceph-deploy create ceph01-vm , and then ceph-deploy mon add ceph02-vm, *not meet this problem. i do not know why ? 2014-08-23 10:19:43.910650 7f3c0028c700 0 mon.ceph01-vm at 1(peon).data_health(56) *update_stats avail 5% total 15798272 used 12941508 avail 926268* 2014-08-23 10:19:43.910806 7f3c0028c700 -1 mon.ceph01-vm at 1(peon).data_health(56) reached critical levels of available space on local monitor storage -- shutdown! 2014-08-23 10:19:43.910811 7f3c0028c700 0 ** Shutdown via Data Health Service ** 2014-08-23 10:19:43.931427 7f3bffa8b700 1 mon.ceph01-vm at 1(peon).paxos(paxos active c 15814..16493) is_readable now=2014-08-23 10:19:43.931433 lease_expire=2014-08-23 10:19:45.989585 has v0 lc 16493 2014-08-23 10:19:43.931486 7f3bfe887700 -1 mon.ceph01-vm at 1(peon) e2 *** Got Signal Interrupt *** 2014-08-23 10:19:43.931515 7f3bfe887700 1 mon.ceph01-vm at 1(peon) e2 shutdown 2014-08-23 10:19:43.931725 7f3bfe887700 0 quorum service shutdown 2014-08-23 10:19:43.931730 7f3bfe887700 0 mon.ceph01-vm at 1(shutdown).health(56) HealthMonitor::service_shutdown 1 services 2014-08-23 10:19:43.931735 7f3bfe887700 0 quorum service shutdown 2014-08-22 21:31 GMT+07:00 debian Only <onlydebian at gmail.com>: > this time ceph01-vm down, no big log happen , other 2 ok. do not > what's the reason, this is not my first time install Ceph. but this is > first time i meet that mon down again and again. > > ceph.conf on each OSDs and MONs > [global] > fsid = 075f1aae-48de-412e-b024-b0f014dbc8cf > mon_initial_members = ceph01-vm, ceph02-vm, ceph04-vm > mon_host = 192.168.123.251,192.168.123.252,192.168.123.250 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > > rgw print continue = false > rgw dns name = ceph-radosgw > osd pool default pg num = 128 > osd pool default pgp num = 128 > > > [client.radosgw.gateway] > host = ceph-radosgw > keyring = /etc/ceph/ceph.client.radosgw.keyring > rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock > log file = /var/log/ceph/client.radosgw.gateway.log > > > 2014-08-22 18:15 GMT+07:00 Joao Eduardo Luis <joao.luis at inktank.com>: > > On 08/22/2014 10:21 AM, debian Only wrote: >> >>> i have 3 mons in Ceph 0.80.5 on Wheezy. have one RadosGW >>> >>> when happen this first time, i increase the mon log device. >>> this time mon.ceph02-vm down, only this mon down, other 2 is ok. >>> >>> pls some one give me some guide. >>> >>> 27M Aug 22 02:11 ceph-mon.ceph04-vm.log >>> 43G Aug 22 02:11 ceph-mon.ceph02-vm.log >>> 2G Aug 22 02:11 ceph-mon.ceph01-vm.log >>> >> >> Depending on the debug level you set, and depending on which subsystems >> you set a higher debug level, the monitor can spit out A LOT of information >> in a short period of time. 43GB is nothing compared to some 100+ GB logs >> I've had churn through in the past. >> >> However, I'm not grasping what kind of help you need. According to your >> 'ceph -s' below the monitors seem okay -- all are in, health is OK. >> >> If you issue is with having that one monitor spitting out humongous >> amounts of debug info here's what you need to do: >> >> - If you added one or more 'debug <something> = X' to that monitor's >> ceph.conf, you will want to remove them so that in a future restart the >> monitor doesn't start with non-default debug levels. >> >> - You will want to inject default debug levels into that one monitor. >> >> Depending on what debug levels you have increased, you will want to run a >> version of "ceph tell mon.ceph02-vm injectargs '--debug-mon 1/5 --debug-ms >> 0/5 --debug-paxos 1/5'" >> >> -Joao >> >> >>> # ceph -s >>> cluster 075f1aae-48de-412e-b024-b0f014dbc8cf >>> health HEALTH_OK >>> monmap e2: 3 mons at >>> {ceph01-vm=192.168.123.251:6789/0,ceph02-vm=192.168.123. >>> 252:6789/0,ceph04-vm=192.168.123.250:6789/0 >>> <http://192.168.123.251:6789/0,ceph02-vm=192.168.123.252: >>> 6789/0,ceph04-vm=192.168.123.250:6789/0>}, >>> >>> election epoch 44, quorum 0,1,2 ceph04-vm,ceph01-vm,ceph02-vm >>> mdsmap e10: 1/1/1 up {0=ceph06-vm=up:active} >>> osdmap e145: 10 osds: 10 up, 10 in >>> pgmap v4394: 2392 pgs, 21 pools, 4503 MB data, 1250 objects >>> 13657 MB used, 4908 GB / 4930 GB avail >>> 2392 active+clean >>> >>> >>> /2014-08-22 02:06:34.738828 7ff2b9557700 1 >>> >>> mon.ceph02-vm at 2(peon).paxos(paxos active c 9037..9756) is_readable >>> now=2014-08-22 02:06:34.738830 lease_expire=2014-08-22 02:06:39.701305 >>> has v0 lc 9756/ >>> /2014-08-22 02:06:36.618805 7ff2b9557700 1 >>> >>> mon.ceph02-vm at 2(peon).paxos(paxos active c 9037..9756) is_readable >>> now=2014-08-22 02:06:36.618807 lease_expire=2014-08-22 02:06:39.701305 >>> has v0 lc 9756/ >>> /2014-08-22 02:06:36.620019 7ff2b9557700 1 >>> >>> mon.ceph02-vm at 2(peon).paxos(paxos active c 9037..9756) is_readable >>> now=2014-08-22 02:06:36.620021 lease_expire=2014-08-22 02:06:39.701305 >>> has v0 lc 9756/ >>> /2014-08-22 02:06:36.620975 7ff2b9557700 1 >>> >>> mon.ceph02-vm at 2(peon).paxos(paxos active c 9037..9756) is_readable >>> now=2014-08-22 02:06:36.620977 lease_expire=2014-08-22 02:06:39.701305 >>> has v0 lc 9756/ >>> /2014-08-22 02:06:36.629362 7ff2b9557700 0 mon.ceph02-vm at 2(peon) e2 >>> >>> handle_command mon_command({"prefix": "mon_status", "format": "json"} v >>> 0) v1/ >>> /2014-08-22 02:06:36.633007 7ff2b9557700 0 mon.ceph02-vm at 2(peon) e2 >>> handle_command mon_command({"prefix": "status", "format": "json"} v 0) >>> v1/ >>> /2014-08-22 02:06:36.637002 7ff2b9557700 0 mon.ceph02-vm at 2(peon) e2 >>> >>> handle_command mon_command({"prefix": "health", "detail": "", "format": >>> "json"} v 0) v1/ >>> /2014-08-22 02:06:36.640971 7ff2b9557700 0 mon.ceph02-vm at 2(peon) e2 >>> >>> handle_command mon_command({"dumpcontents": ["pgs_brief"], "prefix": "pg >>> dump", "format": "json"} v 0) v1/ >>> /2014-08-22 02:06:36.641014 7ff2b9557700 1 >>> >>> mon.ceph02-vm at 2(peon).paxos(paxos active c 9037..9756) is_readable >>> now=2014-08-22 02:06:36.641016 lease_expire=2014-08-22 02:06:39.701305 >>> has v0 lc 9756/ >>> /2014-08-22 02:06:37.520387 7ff2b9557700 1 >>> >>> mon.ceph02-vm at 2(peon).paxos(paxos active c 9037..9757) is_readable >>> now=2014-08-22 02:06:37.520388 lease_expire=2014-08-22 02:06:42.501572 >>> has v0 lc 9757/ >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users at lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >> >> -- >> Joao Eduardo Luis >> Software Engineer | http://inktank.com | http://ceph.com >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140824/26791b5c/attachment.htm>