Hello Lukas,
Please try the following process for getting all your OSDs up and operational...for i in noup noin noscrub nodeep-scrub norecover nobackfill; do ceph osd set $i; done
* Stop all OSDs (I know, this seems counter productive)
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
osd_backfill_scan_min = 8
osd_heartbeat_interval = 36
osd_heartbeat_grace = 240
osd_map_message_max = 1000
osd_map_cache_size = 3136
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
osd_backfill_scan_min = 8
osd_heartbeat_interval = 36
osd_heartbeat_grace = 240
osd_map_message_max = 1000
osd_map_cache_size = 3136
* Start all OSDs
* Monitor 'top' for 0% CPU on all OSD processes.. it may take a while.. I usually issue 'top' then, the keys M c
- M = Sort by memory usage
- c = Show command arguments
- This allows to easily monitor the OSD process and know which OSDs have settled, etc..
* Once all OSDs have hit 0% CPU utilization, remove the 'noup' flag
- ceph osd unset noup
* Again, wait for 0% CPU utilization (may be immediate, may take a while.. just gotta wait)
* Once all OSDs have hit 0% CPU again, remove the 'noin' flag
- ceph osd unset noin
- All OSDs should now appear up/in, and will go through peering..
* Once ceph -s shows no further activity, and OSDs are back at 0% CPU again, unset 'nobackfill'
* Once ceph -s shows no further activity, and OSDs are back at 0% CPU again, unset 'nobackfill'
- ceph osd unset nobackfill
* Once ceph -s shows no further activity, and OSDs are back at 0% CPU again, unset 'norecover'
* Once ceph -s shows no further activity, and OSDs are back at 0% CPU again, unset 'norecover'
- ceph osd unset norecover
* Monitor OSD memory usage... some OSDs may get killed off again, but their subsequent restart should consume less memory and allow more recovery to occur between each step above.. and ultimately, hopefully... your entire cluster will come back online and be usable.
## Clean-up:
* Remove all of the above set options from ceph.conf
* Reset the running OSDs to their defaults:
ceph tell osd.\* injectargs '--osd_max_backfills 10 --osd_recovery_max_active 15 --osd_recovery_max_single_start 5 --osd_backfill_scan_min 64 --osd_heartbeat_interval 6 --osd_heartbeat_grace 36 --osd_map_message_max 100 --osd_map_cache_size 500'
ceph tell osd.\* injectargs '--osd_max_backfills 10 --osd_recovery_max_active 15 --osd_recovery_max_single_start 5 --osd_backfill_scan_min 64 --osd_heartbeat_interval 6 --osd_heartbeat_grace 36 --osd_map_message_max 100 --osd_map_cache_size 500'
* Unset the noscrub and nodeep-scrub flags:
- ceph osd unset noscrub
- ceph osd unset nodeep-scrub
- ceph osd unset nodeep-scrub
## For help identifying why memory usage was so high, please provide:
* ceph osd dump | grep pool
* ceph osd crush rule dump
Let us know if this helps... I know it looks extreme, but it's worked for me in the past..
Michael J. Kidd
Sr. Storage Consultant
Inktank Professional Services
- by Red HatSr. Storage Consultant
Inktank Professional Services
On Wed, Oct 29, 2014 at 8:51 AM, Lukáš Kubín <lukas.kubin@xxxxxxxxx> wrote:
Hello,I've found my ceph v 0.80.3 cluster in a state with 5 of 34 OSDs being down through night after months of running without change. From Linux logs I found out the OSD processes were killed because they consumed all available memory.Those 5 failed OSDs were from different hosts of my 4-node cluster (see below). Two hosts act as SSD cache tier in some of my pools. The other two hosts are the default rotational drives storage.After checking the Linux was not out of memory I've attempted to restart those failed OSDs. Most of those OSD daemon exhaust all memory in seconds and got killed by Linux again:Oct 28 22:16:34 q07 kernel: Out of memory: Kill process 24207 (ceph-osd) score 867 or sacrifice childOct 28 22:16:34 q07 kernel: Killed process 24207, UID 0, (ceph-osd) total-vm:59974412kB, anon-rss:59076880kB, file-rss:512kBOn the host I've found lots of similar "slow request" messages preceding the crash:2014-10-28 22:11:20.885527 7f25f84d1700 0 log [WRN] : slow request 31.117125 seconds old, received at 2014-10-28 22:10:49.768291: osd_sub_op(client.168752.0:2197931 14.2c7 888596c7/rbd_data.293272f8695e4.000000000000006f/head//14 [] v 1551'377417 snapset=0=[]:[] snapc=0=[]) v10 currently no flag points reached2014-10-28 22:11:21.885668 7f25f84d1700 0 log [WRN] : 67 slow requests, 1 included below; oldest blocked for > 9879.304770 secsApparently I can't get the cluster fixed by restarting the OSDs all over again. Is there any other option then?Thank you.Lukas Kubin[root@q04 ~]# ceph -scluster ec433b4a-9dc0-4d08-bde4-f1657b1fdb99health HEALTH_ERR 9 pgs backfill; 1 pgs backfilling; 521 pgs degraded; 425 pgs incomplete; 13 pgs inconsistent; 20 pgs recovering; 50 pgs recovery_wait; 151 pgs stale; 425 pgs stuck inactive; 151 pgs stuck stale; 1164 pgs stuck unclean; 12070270 requests are blocked > 32 sec; recovery 887322/35206223 objects degraded (2.520%); 119/17131232 unfound (0.001%); 13 scrub errorsmonmap e2: 3 mons at {q03=10.255.253.33:6789/0,q04=10.255.253.34:6789/0,q05=10.255.253.35:6789/0}, election epoch 90, quorum 0,1,2 q03,q04,q05osdmap e2194: 34 osds: 31 up, 31 inpgmap v7429812: 5632 pgs, 7 pools, 1446 GB data, 16729 kobjects2915 GB used, 12449 GB / 15365 GB avail887322/35206223 objects degraded (2.520%); 119/17131232 unfound (0.001%)38 active+recovery_wait+remapped4455 active+clean65 stale+incomplete3 active+recovering+remapped359 incomplete12 active+recovery_wait139 active+remapped86 stale+active+degraded16 active+recovering1 active+remapped+backfilling13 active+clean+inconsistent9 active+remapped+wait_backfill434 active+degraded1 remapped+incomplete1 active+recovering+degraded+remappedclient io 0 B/s rd, 469 kB/s wr, 48 op/s[root@q04 ~]# ceph osd tree# id weight type name up/down reweight-5 3.24 root ssd-6 1.62 host q0616 0.18 osd.16 up 117 0.18 osd.17 up 118 0.18 osd.18 up 119 0.18 osd.19 up 120 0.18 osd.20 up 121 0.18 osd.21 up 122 0.18 osd.22 up 123 0.18 osd.23 up 124 0.18 osd.24 up 1-7 1.62 host q0725 0.18 osd.25 up 126 0.18 osd.26 up 127 0.18 osd.27 up 128 0.18 osd.28 up 129 0.18 osd.29 up 130 0.18 osd.30 up 131 0.18 osd.31 up 132 0.18 osd.32 up 133 0.18 osd.33 up 1-1 14.56 root default-4 14.56 root sata-2 7.28 host q080 0.91 osd.0 up 11 0.91 osd.1 up 12 0.91 osd.2 up 13 0.91 osd.3 up 111 0.91 osd.11 up 112 0.91 osd.12 up 113 0.91 osd.13 down 014 0.91 osd.14 up 1-3 7.28 host q094 0.91 osd.4 up 15 0.91 osd.5 up 16 0.91 osd.6 up 17 0.91 osd.7 up 18 0.91 osd.8 down 09 0.91 osd.9 up 110 0.91 osd.10 down 015 0.91 osd.15 up 1
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com