Dimitar, I would agree with you that getting the cluster into a healthy state first is probably the better idea. Based on your pg query, it appears like you're using only 1 replica. Any ideas why that would be? The output should look like this (with 3 replicas): osdmap e133481 pg 11.1b8 (11.1b8) -> up [13,58,37] acting [13,58,37] Bryan From: Dimitar Boichev <Dimitar.Boichev@xxxxxxxxxxxxx> Date: Tuesday, February 23, 2016 at 1:08 AM To: CTG User <bryan.stillwell@xxxxxxxxxxx>, "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx> Subject: RE: osd not removed from crush map after ceph osd crush remove >Hello, >Thank you Bryan. > >I was just trying to upgrade to hammer or upper but before that I was >wanting to get the cluster in Healthy state. >Do you think it is safe to upgrade now first to latest firefly then to >Hammer ? > > >Regards. > >Dimitar Boichev >SysAdmin Team Lead >AXSMarine Sofia >Phone: +359 889 22 55 42 >Skype: dimitar.boichev.axsmarine >E-mail: >dimitar.boichev@xxxxxxxxxxxxx > > >From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] >On Behalf Of Stillwell, Bryan >Sent: Tuesday, February 23, 2016 1:51 AM >To: ceph-users@xxxxxxxxxxxxxx >Subject: Re: osd not removed from crush map after ceph osd >crush remove > > > >Dimitar, > > > >I'm not sure why those PGs would be stuck in the stale+active+clean >state. Maybe try upgrading to the 0.80.11 release to see if it's a bug >that was fixed already? You can use the 'ceph tell osd.* > version' command after the upgrade to make sure all OSDs are running the >new version. Also since firefly (0.80.x) is near its EOL, you should >consider upgrading to hammer (0.94.x). > > > >As for why osd.4 didn't get fully removed, the last command you ran isn't >correct. It should be 'ceph osd rm 4'. Trying to remember when to use >the CRUSH name (osd.4) versus the OSD number (4) > can be a pain. > > > >Bryan > > > >From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Dimitar >Boichev <Dimitar.Boichev@xxxxxxxxxxxxx> >Date: Monday, February 22, 2016 at 1:10 AM >To: Dimitar Boichev <Dimitar.Boichev@xxxxxxxxxxxxx>, >"ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx> >Subject: Re: osd not removed from crush map after ceph osd >crush remove > > > >>Anyone ? >> >>Regards. >> >> >>From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] >>On Behalf Of Dimitar Boichev >>Sent: Thursday, February 18, 2016 5:06 PM >>To: ceph-users@xxxxxxxxxxxxxx >>Subject: osd not removed from crush map after ceph osd >>crush remove >> >> >> >>Hello, >>I am running a tiny cluster of 2 nodes. >>ceph -v >>ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) >> >>One osd died and I added a new osd (not replacing the old one). >>After that I wanted to remove the failed osd completely from the cluster. >>Here is what I did: >>ceph osd reweight osd.4 0.0 >>ceph osd crush reweight osd.4 0.0 >>ceph osd out osd.4 >>ceph osd crush remove osd.4 >>ceph auth del osd.4 >>ceph osd rm osd.4 >> >> >>But after the rebalancing I ended up with 155 PGs in stale+active+clean >>state. >> >>@storage1:/tmp# ceph -s >> cluster 7a9120b9-df42-4308-b7b1-e1f3d0f1e7b3 >> health HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests >>are blocked > 32 sec; nodeep-scrub flag(s) set >> monmap e1: 1 mons at {storage1=192.168.10.3:6789/0}, election epoch >>1, quorum 0 storage1 >> osdmap e1064: 6 osds: 6 up, 6 in >> flags nodeep-scrub >> pgmap v26760322: 712 pgs, 8 pools, 532 GB data, 155 kobjects >> 1209 GB used, 14210 GB / 15419 GB avail >> 155 stale+active+clean >> 557 active+clean >> client io 91925 B/s wr, 5 op/s >> >>I know about the 1 monitor problem I just want to fix the cluster to >>healthy state then I will add the third storage node and go up to 3 >>monitors. >> >>The problem is as follows: >>@storage1:/tmp# ceph pg map 2.3a >>osdmap e1064 pg 2.3a (2.3a) -> up [6] acting [6] >>@storage1:/tmp# ceph pg 2.3a query >>Error ENOENT: i don't have pgid 2.3a >> >> >>@storage1:/tmp# ceph health detail >>HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests are blocked > >>32 sec; 1 osds have slow requests; nodeep-scrub flag(s) set >>pg 7.2a is stuck stale for 8887559.656879, current state >>stale+active+clean, last acting [4] >>pg 5.28 is stuck stale for 8887559.656886, current state >>stale+active+clean, last acting [4] >>pg 7.2b is stuck stale for 8887559.656889, current state >>stale+active+clean, last acting [4] >>pg 7.2c is stuck stale for 8887559.656892, current state >>stale+active+clean, last acting [4] >>pg 0.2b is stuck stale for 8887559.656893, current state >>stale+active+clean, last acting [4] >>pg 6.2c is stuck stale for 8887559.656894, current state >>stale+active+clean, last acting [4] >>pg 6.2f is stuck stale for 8887559.656893, current state >>stale+active+clean, last acting [4] >>pg 2.2b is stuck stale for 8887559.656896, current state >>stale+active+clean, last acting [4] >>pg 2.25 is stuck stale for 8887559.656896, current state >>stale+active+clean, last acting [4] >>pg 6.20 is stuck stale for 8887559.656898, current state >>stale+active+clean, last acting [4] >>pg 5.21 is stuck stale for 8887559.656898, current state >>stale+active+clean, last acting [4] >>pg 0.24 is stuck stale for 8887559.656904, current state >>stale+active+clean, last acting [4] >>pg 2.21 is stuck stale for 8887559.656904, current state >>stale+active+clean, last acting [4] >>pg 5.27 is stuck stale for 8887559.656906, current state >>stale+active+clean, last acting [4] >>pg 2.23 is stuck stale for 8887559.656908, current state >>stale+active+clean, last acting [4] >>pg 6.26 is stuck stale for 8887559.656909, current state >>stale+active+clean, last acting [4] >>pg 7.27 is stuck stale for 8887559.656913, current state >>stale+active+clean, last acting [4] >>pg 7.18 is stuck stale for 8887559.656914, current state >>stale+active+clean, last acting [4] >>pg 0.1e is stuck stale for 8887559.656914, current state >>stale+active+clean, last acting [4] >>pg 6.18 is stuck stale for 8887559.656919, current state >>stale+active+clean, last acting [4] >>pg 2.1f is stuck stale for 8887559.656919, current state >>stale+active+clean, last acting [4] >>pg 7.1b is stuck stale for 8887559.656922, current state >>stale+active+clean, last acting [4] >>pg 0.1b is stuck stale for 8887559.656919, current state >>stale+active+clean, last acting [4] >>pg 6.1d is stuck stale for 8887559.656925, current state >>stale+active+clean, last acting [4] >>pg 2.18 is stuck stale for 8887559.656920, current state >>stale+active+clean, last acting [4] >>pg 7.1d is stuck stale for 8887559.656926, current state >>stale+active+clean, last acting [4] >>pg 5.1c is stuck stale for 8887559.656921, current state >>stale+active+clean, last acting [4] >>pg 5.1d is stuck stale for 8887559.656920, current state >>stale+active+clean, last acting [4] >>pg 6.11 is stuck stale for 8887559.656922, current state >>stale+active+clean, last acting [4] >>pg 5.13 is stuck stale for 8887559.656919, current state >>stale+active+clean, last acting [4] >>pg 0.16 is stuck stale for 8887559.656924, current state >>stale+active+clean, last acting [4] >>pg 6.10 is stuck stale for 8887559.656928, current state >>stale+active+clean, last acting [4] >>pg 2.17 is stuck stale for 8887559.656927, current state >>stale+active+clean, last acting [4] >>pg 7.12 is stuck stale for 8887559.656932, current state >>stale+active+clean, last acting [4] >>pg 0.12 is stuck stale for 8887559.656929, current state >>stale+active+clean, last acting [4] >>pg 6.14 is stuck stale for 8887559.656935, current state >>stale+active+clean, last acting [4] >>pg 0.11 is stuck stale for 8887559.656932, current state >>stale+active+clean, last acting [4] >>pg 7.16 is stuck stale for 8887559.656936, current state >>stale+active+clean, last acting [4] >>pg 0.10 is stuck stale for 8887559.656936, current state >>stale+active+clean, last acting [4] >>pg 2.d is stuck stale for 8887559.656933, current state >>stale+active+clean, last acting [4] >>pg 6.9 is stuck stale for 8887559.656939, current state >>stale+active+clean, last acting [4] >>pg 7.9 is stuck stale for 8887559.656939, current state >>stale+active+clean, last acting [4] >>pg 0.d is stuck stale for 8887559.656940, current state >>stale+active+clean, last acting [4] >>pg 7.a is stuck stale for 8887559.656944, current state >>stale+active+clean, last acting [4] >>pg 0.c is stuck stale for 8887559.656941, current state >>stale+active+clean, last acting [4] >>pg 2.e is stuck stale for 8887559.656947, current state >>stale+active+clean, last acting [4] >>pg 6.a is stuck stale for 8887559.656953, current state >>stale+active+clean, last acting [4] >>pg 0.b is stuck stale for 8887559.656949, current state >>stale+active+clean, last acting [4] >>pg 2.9 is stuck stale for 8887559.656954, current state >>stale+active+clean, last acting [4] >>pg 5.f is stuck stale for 8887559.656953, current state >>stale+active+clean, last acting [4] >>pg 7.d is stuck stale for 8887559.656958, current state >>stale+active+clean, last acting [4] >>pg 6.f is stuck stale for 8887559.656957, current state >>stale+active+clean, last acting [4] >>pg 3.4 is stuck stale for 8887559.656957, current state >>stale+active+clean, last acting [4] >>pg 5.3 is stuck stale for 8887559.656956, current state >>stale+active+clean, last acting [4] >>pg 2.4 is stuck stale for 8887559.656961, current state >>stale+active+clean, last acting [4] >>pg 6.0 is stuck stale for 8887559.656966, current state >>stale+active+clean, last acting [4] >>pg 3.6 is stuck stale for 8887559.656965, current state >>stale+active+clean, last acting [4] >>pg 3.7 is stuck stale for 8887559.656964, current state >>stale+active+clean, last acting [4] >>pg 2.6 is stuck stale for 8887559.656970, current state >>stale+active+clean, last acting [4] >>pg 0.3 is stuck stale for 8887559.656965, current state >>stale+active+clean, last acting [4] >>pg 5.6 is stuck stale for 8887559.656970, current state >>stale+active+clean, last acting [4] >>pg 7.4 is stuck stale for 8887559.656975, current state >>stale+active+clean, last acting [4] >>pg 3.1 is stuck stale for 8887559.656970, current state >>stale+active+clean, last acting [4] >>pg 6.4 is stuck stale for 8887559.656975, current state >>stale+active+clean, last acting [4] >>pg 5.4 is stuck stale for 8887559.656972, current state >>stale+active+clean, last acting [4] >>pg 2.3 is stuck stale for 8887559.656977, current state >>stale+active+clean, last acting [4] >>pg 5.5 is stuck stale for 8887559.656977, current state >>stale+active+clean, last acting [4] >>pg 3.3 is stuck stale for 8887559.656982, current state >>stale+active+clean, last acting [4] >>pg 5.7a is stuck stale for 8887559.657309, current state >>stale+active+clean, last acting [4] >>pg 6.78 is stuck stale for 8887559.657308, current state >>stale+active+clean, last acting [4] >>pg 5.78 is stuck stale for 8887559.657311, current state >>stale+active+clean, last acting [4] >>pg 5.79 is stuck stale for 8887559.657311, current state >>stale+active+clean, last acting [4] >>pg 6.7c is stuck stale for 8887559.657313, current state >>stale+active+clean, last acting [4] >>pg 7.7e is stuck stale for 8887559.657312, current state >>stale+active+clean, last acting [4] >>pg 6.7e is stuck stale for 8887559.657315, current state >>stale+active+clean, last acting [4] >>pg 7.70 is stuck stale for 8887559.657316, current state >>stale+active+clean, last acting [4] >>pg 6.73 is stuck stale for 8887559.657316, current state >>stale+active+clean, last acting [4] >>pg 5.77 is stuck stale for 8887559.657317, current state >>stale+active+clean, last acting [4] >>pg 5.74 is stuck stale for 8887559.657319, current state >>stale+active+clean, last acting [4] >>pg 5.75 is stuck stale for 8887559.657321, current state >>stale+active+clean, last acting [4] >>pg 7.68 is stuck stale for 8887559.657322, current state >>stale+active+clean, last acting [4] >>pg 6.68 is stuck stale for 8887559.657324, current state >>stale+active+clean, last acting [4] >>pg 7.6b is stuck stale for 8887559.657326, current state >>stale+active+clean, last acting [4] >>pg 6.6d is stuck stale for 8887559.657328, current state >>stale+active+clean, last acting [4] >>pg 5.6e is stuck stale for 8887559.657330, current state >>stale+active+clean, last acting [4] >>pg 6.6c is stuck stale for 8887559.657330, current state >>stale+active+clean, last acting [4] >>pg 7.6f is stuck stale for 8887559.657331, current state >>stale+active+clean, last acting [4] >>pg 7.60 is stuck stale for 8887559.657333, current state >>stale+active+clean, last acting [4] >>pg 6.60 is stuck stale for 8887559.657333, current state >>stale+active+clean, last acting [4] >>pg 7.62 is stuck stale for 8887559.657334, current state >>stale+active+clean, last acting [4] >>pg 6.65 is stuck stale for 8887559.657334, current state >>stale+active+clean, last acting [4] >>pg 7.64 is stuck stale for 8887559.657339, current state >>stale+active+clean, last acting [4] >>pg 5.67 is stuck stale for 8887559.657338, current state >>stale+active+clean, last acting [4] >>pg 7.66 is stuck stale for 8887559.657340, current state >>stale+active+clean, last acting [4] >>pg 6.66 is stuck stale for 8887559.657340, current state >>stale+active+clean, last acting [4] >>pg 7.67 is stuck stale for 8887559.657345, current state >>stale+active+clean, last acting [4] >>pg 6.59 is stuck stale for 8887559.657344, current state >>stale+active+clean, last acting [4] >>pg 7.58 is stuck stale for 8887559.657348, current state >>stale+active+clean, last acting [4] >>pg 6.58 is stuck stale for 8887559.657348, current state >>stale+active+clean, last acting [4] >>pg 7.59 is stuck stale for 8887559.657352, current state >>stale+active+clean, last acting [4] >>pg 6.5b is stuck stale for 8887559.657353, current state >>stale+active+clean, last acting [4] >>pg 5.59 is stuck stale for 8887559.657348, current state >>stale+active+clean, last acting [4] >>pg 6.5a is stuck stale for 8887559.657356, current state >>stale+active+clean, last acting [4] >>pg 5.5e is stuck stale for 8887559.657352, current state >>stale+active+clean, last acting [4] >>pg 6.5d is stuck stale for 8887559.657358, current state >>stale+active+clean, last acting [4] >>pg 6.5f is stuck stale for 8887559.657356, current state >>stale+active+clean, last acting [4] >>pg 7.51 is stuck stale for 8887559.657356, current state >>stale+active+clean, last acting [4] >>pg 7.52 is stuck stale for 8887559.657356, current state >>stale+active+clean, last acting [4] >>pg 7.53 is stuck stale for 8887559.657358, current state >>stale+active+clean, last acting [4] >>pg 6.55 is stuck stale for 8887559.657359, current state >>stale+active+clean, last acting [4] >>pg 7.54 is stuck stale for 8887559.657364, current state >>stale+active+clean, last acting [4] >>pg 6.54 is stuck stale for 8887559.657364, current state >>stale+active+clean, last acting [4] >>pg 6.57 is stuck stale for 8887559.657365, current state >>stale+active+clean, last acting [4] >>pg 7.56 is stuck stale for 8887559.657369, current state >>stale+active+clean, last acting [4] >>pg 5.55 is stuck stale for 8887559.657371, current state >>stale+active+clean, last acting [4] >>pg 7.48 is stuck stale for 8887559.657372, current state >>stale+active+clean, last acting [4] >>pg 6.49 is stuck stale for 8887559.657375, current state >>stale+active+clean, last acting [4] >>pg 5.4a is stuck stale for 8887559.657376, current state >>stale+active+clean, last acting [4] >>pg 6.48 is stuck stale for 8887559.657379, current state >>stale+active+clean, last acting [4] >>pg 7.4a is stuck stale for 8887559.657380, current state >>stale+active+clean, last acting [4] >>pg 6.4a is stuck stale for 8887559.657383, current state >>stale+active+clean, last acting [4] >>pg 6.4d is stuck stale for 8887559.657385, current state >>stale+active+clean, last acting [4] >>pg 7.4d is stuck stale for 8887559.657387, current state >>stale+active+clean, last acting [4] >>pg 6.4c is stuck stale for 8887559.657389, current state >>stale+active+clean, last acting [4] >>pg 6.4e is stuck stale for 8887559.657391, current state >>stale+active+clean, last acting [4] >>pg 5.42 is stuck stale for 8887559.657391, current state >>stale+active+clean, last acting [4] >>pg 6.43 is stuck stale for 8887559.657393, current state >>stale+active+clean, last acting [4] >>pg 5.41 is stuck stale for 8887559.657393, current state >>stale+active+clean, last acting [4] >>pg 5.47 is stuck stale for 8887559.657394, current state >>stale+active+clean, last acting [4] >>pg 7.46 is stuck stale for 8887559.657396, current state >>stale+active+clean, last acting [4] >>pg 6.39 is stuck stale for 8887559.657398, current state >>stale+active+clean, last acting [4] >>pg 5.3a is stuck stale for 8887559.657399, current state >>stale+active+clean, last acting [4] >>pg 2.3e is stuck stale for 8887559.657399, current state >>stale+active+clean, last acting [4] >>pg 0.3c is stuck stale for 8887559.657402, current state >>stale+active+clean, last acting [4] >>pg 7.3c is stuck stale for 8887559.657404, current state >>stale+active+clean, last acting [4] >>pg 7.3d is stuck stale for 8887559.657405, current state >>stale+active+clean, last acting [4] >>pg 0.39 is stuck stale for 8887559.657402, current state >>stale+active+clean, last acting [4] >>pg 5.3c is stuck stale for 8887559.657405, current state >>stale+active+clean, last acting [4] >>pg 2.3a is stuck stale for 8887559.657406, current state >>stale+active+clean, last acting [4] >>pg 0.38 is stuck stale for 8887559.657409, current state >>stale+active+clean, last acting [4] >>pg 2.35 is stuck stale for 8887559.657411, current state >>stale+active+clean, last acting [4] >>pg 0.37 is stuck stale for 8887559.657412, current state >>stale+active+clean, last acting [4] >>pg 5.32 is stuck stale for 8887559.657413, current state >>stale+active+clean, last acting [4] >>pg 2.34 is stuck stale for 8887559.657416, current state >>stale+active+clean, last acting [4] >>pg 0.36 is stuck stale for 8887559.657416, current state >>stale+active+clean, last acting [4] >>pg 7.32 is stuck stale for 8887559.657419, current state >>stale+active+clean, last acting [4] >>pg 6.33 is stuck stale for 8887559.657420, current state >>stale+active+clean, last acting [4] >>pg 0.35 is stuck stale for 8887559.657423, current state >>stale+active+clean, last acting [4] >>pg 6.35 is stuck stale for 8887559.657423, current state >>stale+active+clean, last acting [4] >>pg 5.36 is stuck stale for 8887559.657424, current state >>stale+active+clean, last acting [4] >>pg 2.30 is stuck stale for 8887559.657427, current state >>stale+active+clean, last acting [4] >>pg 5.37 is stuck stale for 8887559.657429, current state >>stale+active+clean, last acting [4] >>pg 7.36 is stuck stale for 8887559.657430, current state >>stale+active+clean, last acting [4] >>pg 6.37 is stuck stale for 8887559.657432, current state >>stale+active+clean, last acting [4] >>pg 6.28 is stuck stale for 8887559.657427, current state >>stale+active+clean, last acting [4] >> >> >>This stays that way and I think this is because when I downloaded and >>decompiled the crush map I discovered this: >>@storage1:/tmp# crushtool -d /tmp/crushmap >># begin crush map >>tunable choose_local_tries 0 >>tunable choose_local_fallback_tries 0 >>tunable choose_total_tries 50 >>tunable chooseleaf_descend_once 1 >> >># devices >>device 0 osd.0 >>device 1 osd.1 >>device 2 osd.2 >>device 3 osd.3 >>device 4 device4 >>device 5 osd.5 >>device 6 osd.6 >> >> >> >>Is there a way to remove this device 4 aka osd.4 from here so ceph can >>make another copy from the other location shown in ³ceph pg map 2.3a² ? >> >>Regards. >> >>Dimitar Boichev >>SysAdmin Team Lead >>AXSMarine Sofia >>Phone: +359 889 22 55 42 >>Skype: dimitar.boichev.axsmarine >>E-mail: >>dimitar.boichev@xxxxxxxxxxxxx >> >> ________________________________ This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com