Re: osd not removed from crush map after ceph osd crush remove

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dimitar,

I would agree with you that getting the cluster into a healthy state first
is probably the better idea.  Based on your pg query, it appears like
you're using only 1 replica.  Any ideas why that would be?

The output should look like this (with 3 replicas):

osdmap e133481 pg 11.1b8 (11.1b8) -> up [13,58,37] acting [13,58,37]

Bryan

From:  Dimitar Boichev <Dimitar.Boichev@xxxxxxxxxxxxx>
Date:  Tuesday, February 23, 2016 at 1:08 AM
To:  CTG User <bryan.stillwell@xxxxxxxxxxx>, "ceph-users@xxxxxxxxxxxxxx"
<ceph-users@xxxxxxxxxxxxxx>
Subject:  RE:  osd not removed from crush map after ceph osd
crush remove


>Hello,
>Thank you Bryan.
>
>I was just trying to upgrade to hammer or upper but before that I was
>wanting to get the cluster in Healthy state.
>Do you think it is safe to upgrade now first to latest firefly then to
>Hammer ?
>
>
>Regards.
>
>Dimitar Boichev
>SysAdmin Team Lead
>AXSMarine Sofia
>Phone: +359 889 22 55 42
>Skype: dimitar.boichev.axsmarine
>E-mail:
>dimitar.boichev@xxxxxxxxxxxxx
>
>
>From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
>On Behalf Of Stillwell, Bryan
>Sent: Tuesday, February 23, 2016 1:51 AM
>To: ceph-users@xxxxxxxxxxxxxx
>Subject: Re:  osd not removed from crush map after ceph osd
>crush remove
>
>
>
>Dimitar,
>
>
>
>I'm not sure why those PGs would be stuck in the stale+active+clean
>state.  Maybe try upgrading to the 0.80.11 release to see if it's a bug
>that was fixed already?  You can use the 'ceph tell osd.*
> version' command after the upgrade to make sure all OSDs are running the
>new version.  Also since firefly (0.80.x) is near its EOL, you should
>consider upgrading to hammer (0.94.x).
>
>
>
>As for why osd.4 didn't get fully removed, the last command you ran isn't
>correct.  It should be 'ceph osd rm 4'.  Trying to remember when to use
>the CRUSH name (osd.4) versus the OSD number (4)
> can be a pain.
>
>
>
>Bryan
>
>
>
>From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Dimitar
>Boichev <Dimitar.Boichev@xxxxxxxxxxxxx>
>Date: Monday, February 22, 2016 at 1:10 AM
>To: Dimitar Boichev <Dimitar.Boichev@xxxxxxxxxxxxx>,
>"ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
>Subject: Re:  osd not removed from crush map after ceph osd
>crush remove
>
>
>
>>Anyone ?
>>
>>Regards.
>>
>>
>>From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
>>On Behalf Of Dimitar Boichev
>>Sent: Thursday, February 18, 2016 5:06 PM
>>To: ceph-users@xxxxxxxxxxxxxx
>>Subject:  osd not removed from crush map after ceph osd
>>crush remove
>>
>>
>>
>>Hello,
>>I am running a tiny cluster of 2 nodes.
>>ceph -v
>>ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>>
>>One osd died and I added a new osd (not replacing the old one).
>>After that I wanted to remove the failed osd completely from the cluster.
>>Here is what I did:
>>ceph osd reweight osd.4 0.0
>>ceph osd crush reweight osd.4 0.0
>>ceph osd out osd.4
>>ceph osd crush remove osd.4
>>ceph auth del osd.4
>>ceph osd rm osd.4
>>
>>
>>But after the rebalancing I ended up with 155 PGs in stale+active+clean
>>state.
>>
>>@storage1:/tmp# ceph -s
>>    cluster 7a9120b9-df42-4308-b7b1-e1f3d0f1e7b3
>>     health HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests
>>are blocked > 32 sec; nodeep-scrub flag(s) set
>>     monmap e1: 1 mons at {storage1=192.168.10.3:6789/0}, election epoch
>>1, quorum 0 storage1
>>     osdmap e1064: 6 osds: 6 up, 6 in
>>            flags nodeep-scrub
>>      pgmap v26760322: 712 pgs, 8 pools, 532 GB data, 155 kobjects
>>            1209 GB used, 14210 GB / 15419 GB avail
>>                 155 stale+active+clean
>>                 557 active+clean
>>  client io 91925 B/s wr, 5 op/s
>>
>>I know about the 1 monitor problem I just want to fix the cluster to
>>healthy state then I will add the third storage node and go up to 3
>>monitors.
>>
>>The problem is as follows:
>>@storage1:/tmp# ceph pg map 2.3a
>>osdmap e1064 pg 2.3a (2.3a) -> up [6] acting [6]
>>@storage1:/tmp# ceph pg 2.3a query
>>Error ENOENT: i don't have pgid 2.3a
>>
>>
>>@storage1:/tmp# ceph health detail
>>HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests are blocked >
>>32 sec; 1 osds have slow requests; nodeep-scrub flag(s) set
>>pg 7.2a is stuck stale for 8887559.656879, current state
>>stale+active+clean, last acting [4]
>>pg 5.28 is stuck stale for 8887559.656886, current state
>>stale+active+clean, last acting [4]
>>pg 7.2b is stuck stale for 8887559.656889, current state
>>stale+active+clean, last acting [4]
>>pg 7.2c is stuck stale for 8887559.656892, current state
>>stale+active+clean, last acting [4]
>>pg 0.2b is stuck stale for 8887559.656893, current state
>>stale+active+clean, last acting [4]
>>pg 6.2c is stuck stale for 8887559.656894, current state
>>stale+active+clean, last acting [4]
>>pg 6.2f is stuck stale for 8887559.656893, current state
>>stale+active+clean, last acting [4]
>>pg 2.2b is stuck stale for 8887559.656896, current state
>>stale+active+clean, last acting [4]
>>pg 2.25 is stuck stale for 8887559.656896, current state
>>stale+active+clean, last acting [4]
>>pg 6.20 is stuck stale for 8887559.656898, current state
>>stale+active+clean, last acting [4]
>>pg 5.21 is stuck stale for 8887559.656898, current state
>>stale+active+clean, last acting [4]
>>pg 0.24 is stuck stale for 8887559.656904, current state
>>stale+active+clean, last acting [4]
>>pg 2.21 is stuck stale for 8887559.656904, current state
>>stale+active+clean, last acting [4]
>>pg 5.27 is stuck stale for 8887559.656906, current state
>>stale+active+clean, last acting [4]
>>pg 2.23 is stuck stale for 8887559.656908, current state
>>stale+active+clean, last acting [4]
>>pg 6.26 is stuck stale for 8887559.656909, current state
>>stale+active+clean, last acting [4]
>>pg 7.27 is stuck stale for 8887559.656913, current state
>>stale+active+clean, last acting [4]
>>pg 7.18 is stuck stale for 8887559.656914, current state
>>stale+active+clean, last acting [4]
>>pg 0.1e is stuck stale for 8887559.656914, current state
>>stale+active+clean, last acting [4]
>>pg 6.18 is stuck stale for 8887559.656919, current state
>>stale+active+clean, last acting [4]
>>pg 2.1f is stuck stale for 8887559.656919, current state
>>stale+active+clean, last acting [4]
>>pg 7.1b is stuck stale for 8887559.656922, current state
>>stale+active+clean, last acting [4]
>>pg 0.1b is stuck stale for 8887559.656919, current state
>>stale+active+clean, last acting [4]
>>pg 6.1d is stuck stale for 8887559.656925, current state
>>stale+active+clean, last acting [4]
>>pg 2.18 is stuck stale for 8887559.656920, current state
>>stale+active+clean, last acting [4]
>>pg 7.1d is stuck stale for 8887559.656926, current state
>>stale+active+clean, last acting [4]
>>pg 5.1c is stuck stale for 8887559.656921, current state
>>stale+active+clean, last acting [4]
>>pg 5.1d is stuck stale for 8887559.656920, current state
>>stale+active+clean, last acting [4]
>>pg 6.11 is stuck stale for 8887559.656922, current state
>>stale+active+clean, last acting [4]
>>pg 5.13 is stuck stale for 8887559.656919, current state
>>stale+active+clean, last acting [4]
>>pg 0.16 is stuck stale for 8887559.656924, current state
>>stale+active+clean, last acting [4]
>>pg 6.10 is stuck stale for 8887559.656928, current state
>>stale+active+clean, last acting [4]
>>pg 2.17 is stuck stale for 8887559.656927, current state
>>stale+active+clean, last acting [4]
>>pg 7.12 is stuck stale for 8887559.656932, current state
>>stale+active+clean, last acting [4]
>>pg 0.12 is stuck stale for 8887559.656929, current state
>>stale+active+clean, last acting [4]
>>pg 6.14 is stuck stale for 8887559.656935, current state
>>stale+active+clean, last acting [4]
>>pg 0.11 is stuck stale for 8887559.656932, current state
>>stale+active+clean, last acting [4]
>>pg 7.16 is stuck stale for 8887559.656936, current state
>>stale+active+clean, last acting [4]
>>pg 0.10 is stuck stale for 8887559.656936, current state
>>stale+active+clean, last acting [4]
>>pg 2.d is stuck stale for 8887559.656933, current state
>>stale+active+clean, last acting [4]
>>pg 6.9 is stuck stale for 8887559.656939, current state
>>stale+active+clean, last acting [4]
>>pg 7.9 is stuck stale for 8887559.656939, current state
>>stale+active+clean, last acting [4]
>>pg 0.d is stuck stale for 8887559.656940, current state
>>stale+active+clean, last acting [4]
>>pg 7.a is stuck stale for 8887559.656944, current state
>>stale+active+clean, last acting [4]
>>pg 0.c is stuck stale for 8887559.656941, current state
>>stale+active+clean, last acting [4]
>>pg 2.e is stuck stale for 8887559.656947, current state
>>stale+active+clean, last acting [4]
>>pg 6.a is stuck stale for 8887559.656953, current state
>>stale+active+clean, last acting [4]
>>pg 0.b is stuck stale for 8887559.656949, current state
>>stale+active+clean, last acting [4]
>>pg 2.9 is stuck stale for 8887559.656954, current state
>>stale+active+clean, last acting [4]
>>pg 5.f is stuck stale for 8887559.656953, current state
>>stale+active+clean, last acting [4]
>>pg 7.d is stuck stale for 8887559.656958, current state
>>stale+active+clean, last acting [4]
>>pg 6.f is stuck stale for 8887559.656957, current state
>>stale+active+clean, last acting [4]
>>pg 3.4 is stuck stale for 8887559.656957, current state
>>stale+active+clean, last acting [4]
>>pg 5.3 is stuck stale for 8887559.656956, current state
>>stale+active+clean, last acting [4]
>>pg 2.4 is stuck stale for 8887559.656961, current state
>>stale+active+clean, last acting [4]
>>pg 6.0 is stuck stale for 8887559.656966, current state
>>stale+active+clean, last acting [4]
>>pg 3.6 is stuck stale for 8887559.656965, current state
>>stale+active+clean, last acting [4]
>>pg 3.7 is stuck stale for 8887559.656964, current state
>>stale+active+clean, last acting [4]
>>pg 2.6 is stuck stale for 8887559.656970, current state
>>stale+active+clean, last acting [4]
>>pg 0.3 is stuck stale for 8887559.656965, current state
>>stale+active+clean, last acting [4]
>>pg 5.6 is stuck stale for 8887559.656970, current state
>>stale+active+clean, last acting [4]
>>pg 7.4 is stuck stale for 8887559.656975, current state
>>stale+active+clean, last acting [4]
>>pg 3.1 is stuck stale for 8887559.656970, current state
>>stale+active+clean, last acting [4]
>>pg 6.4 is stuck stale for 8887559.656975, current state
>>stale+active+clean, last acting [4]
>>pg 5.4 is stuck stale for 8887559.656972, current state
>>stale+active+clean, last acting [4]
>>pg 2.3 is stuck stale for 8887559.656977, current state
>>stale+active+clean, last acting [4]
>>pg 5.5 is stuck stale for 8887559.656977, current state
>>stale+active+clean, last acting [4]
>>pg 3.3 is stuck stale for 8887559.656982, current state
>>stale+active+clean, last acting [4]
>>pg 5.7a is stuck stale for 8887559.657309, current state
>>stale+active+clean, last acting [4]
>>pg 6.78 is stuck stale for 8887559.657308, current state
>>stale+active+clean, last acting [4]
>>pg 5.78 is stuck stale for 8887559.657311, current state
>>stale+active+clean, last acting [4]
>>pg 5.79 is stuck stale for 8887559.657311, current state
>>stale+active+clean, last acting [4]
>>pg 6.7c is stuck stale for 8887559.657313, current state
>>stale+active+clean, last acting [4]
>>pg 7.7e is stuck stale for 8887559.657312, current state
>>stale+active+clean, last acting [4]
>>pg 6.7e is stuck stale for 8887559.657315, current state
>>stale+active+clean, last acting [4]
>>pg 7.70 is stuck stale for 8887559.657316, current state
>>stale+active+clean, last acting [4]
>>pg 6.73 is stuck stale for 8887559.657316, current state
>>stale+active+clean, last acting [4]
>>pg 5.77 is stuck stale for 8887559.657317, current state
>>stale+active+clean, last acting [4]
>>pg 5.74 is stuck stale for 8887559.657319, current state
>>stale+active+clean, last acting [4]
>>pg 5.75 is stuck stale for 8887559.657321, current state
>>stale+active+clean, last acting [4]
>>pg 7.68 is stuck stale for 8887559.657322, current state
>>stale+active+clean, last acting [4]
>>pg 6.68 is stuck stale for 8887559.657324, current state
>>stale+active+clean, last acting [4]
>>pg 7.6b is stuck stale for 8887559.657326, current state
>>stale+active+clean, last acting [4]
>>pg 6.6d is stuck stale for 8887559.657328, current state
>>stale+active+clean, last acting [4]
>>pg 5.6e is stuck stale for 8887559.657330, current state
>>stale+active+clean, last acting [4]
>>pg 6.6c is stuck stale for 8887559.657330, current state
>>stale+active+clean, last acting [4]
>>pg 7.6f is stuck stale for 8887559.657331, current state
>>stale+active+clean, last acting [4]
>>pg 7.60 is stuck stale for 8887559.657333, current state
>>stale+active+clean, last acting [4]
>>pg 6.60 is stuck stale for 8887559.657333, current state
>>stale+active+clean, last acting [4]
>>pg 7.62 is stuck stale for 8887559.657334, current state
>>stale+active+clean, last acting [4]
>>pg 6.65 is stuck stale for 8887559.657334, current state
>>stale+active+clean, last acting [4]
>>pg 7.64 is stuck stale for 8887559.657339, current state
>>stale+active+clean, last acting [4]
>>pg 5.67 is stuck stale for 8887559.657338, current state
>>stale+active+clean, last acting [4]
>>pg 7.66 is stuck stale for 8887559.657340, current state
>>stale+active+clean, last acting [4]
>>pg 6.66 is stuck stale for 8887559.657340, current state
>>stale+active+clean, last acting [4]
>>pg 7.67 is stuck stale for 8887559.657345, current state
>>stale+active+clean, last acting [4]
>>pg 6.59 is stuck stale for 8887559.657344, current state
>>stale+active+clean, last acting [4]
>>pg 7.58 is stuck stale for 8887559.657348, current state
>>stale+active+clean, last acting [4]
>>pg 6.58 is stuck stale for 8887559.657348, current state
>>stale+active+clean, last acting [4]
>>pg 7.59 is stuck stale for 8887559.657352, current state
>>stale+active+clean, last acting [4]
>>pg 6.5b is stuck stale for 8887559.657353, current state
>>stale+active+clean, last acting [4]
>>pg 5.59 is stuck stale for 8887559.657348, current state
>>stale+active+clean, last acting [4]
>>pg 6.5a is stuck stale for 8887559.657356, current state
>>stale+active+clean, last acting [4]
>>pg 5.5e is stuck stale for 8887559.657352, current state
>>stale+active+clean, last acting [4]
>>pg 6.5d is stuck stale for 8887559.657358, current state
>>stale+active+clean, last acting [4]
>>pg 6.5f is stuck stale for 8887559.657356, current state
>>stale+active+clean, last acting [4]
>>pg 7.51 is stuck stale for 8887559.657356, current state
>>stale+active+clean, last acting [4]
>>pg 7.52 is stuck stale for 8887559.657356, current state
>>stale+active+clean, last acting [4]
>>pg 7.53 is stuck stale for 8887559.657358, current state
>>stale+active+clean, last acting [4]
>>pg 6.55 is stuck stale for 8887559.657359, current state
>>stale+active+clean, last acting [4]
>>pg 7.54 is stuck stale for 8887559.657364, current state
>>stale+active+clean, last acting [4]
>>pg 6.54 is stuck stale for 8887559.657364, current state
>>stale+active+clean, last acting [4]
>>pg 6.57 is stuck stale for 8887559.657365, current state
>>stale+active+clean, last acting [4]
>>pg 7.56 is stuck stale for 8887559.657369, current state
>>stale+active+clean, last acting [4]
>>pg 5.55 is stuck stale for 8887559.657371, current state
>>stale+active+clean, last acting [4]
>>pg 7.48 is stuck stale for 8887559.657372, current state
>>stale+active+clean, last acting [4]
>>pg 6.49 is stuck stale for 8887559.657375, current state
>>stale+active+clean, last acting [4]
>>pg 5.4a is stuck stale for 8887559.657376, current state
>>stale+active+clean, last acting [4]
>>pg 6.48 is stuck stale for 8887559.657379, current state
>>stale+active+clean, last acting [4]
>>pg 7.4a is stuck stale for 8887559.657380, current state
>>stale+active+clean, last acting [4]
>>pg 6.4a is stuck stale for 8887559.657383, current state
>>stale+active+clean, last acting [4]
>>pg 6.4d is stuck stale for 8887559.657385, current state
>>stale+active+clean, last acting [4]
>>pg 7.4d is stuck stale for 8887559.657387, current state
>>stale+active+clean, last acting [4]
>>pg 6.4c is stuck stale for 8887559.657389, current state
>>stale+active+clean, last acting [4]
>>pg 6.4e is stuck stale for 8887559.657391, current state
>>stale+active+clean, last acting [4]
>>pg 5.42 is stuck stale for 8887559.657391, current state
>>stale+active+clean, last acting [4]
>>pg 6.43 is stuck stale for 8887559.657393, current state
>>stale+active+clean, last acting [4]
>>pg 5.41 is stuck stale for 8887559.657393, current state
>>stale+active+clean, last acting [4]
>>pg 5.47 is stuck stale for 8887559.657394, current state
>>stale+active+clean, last acting [4]
>>pg 7.46 is stuck stale for 8887559.657396, current state
>>stale+active+clean, last acting [4]
>>pg 6.39 is stuck stale for 8887559.657398, current state
>>stale+active+clean, last acting [4]
>>pg 5.3a is stuck stale for 8887559.657399, current state
>>stale+active+clean, last acting [4]
>>pg 2.3e is stuck stale for 8887559.657399, current state
>>stale+active+clean, last acting [4]
>>pg 0.3c is stuck stale for 8887559.657402, current state
>>stale+active+clean, last acting [4]
>>pg 7.3c is stuck stale for 8887559.657404, current state
>>stale+active+clean, last acting [4]
>>pg 7.3d is stuck stale for 8887559.657405, current state
>>stale+active+clean, last acting [4]
>>pg 0.39 is stuck stale for 8887559.657402, current state
>>stale+active+clean, last acting [4]
>>pg 5.3c is stuck stale for 8887559.657405, current state
>>stale+active+clean, last acting [4]
>>pg 2.3a is stuck stale for 8887559.657406, current state
>>stale+active+clean, last acting [4]
>>pg 0.38 is stuck stale for 8887559.657409, current state
>>stale+active+clean, last acting [4]
>>pg 2.35 is stuck stale for 8887559.657411, current state
>>stale+active+clean, last acting [4]
>>pg 0.37 is stuck stale for 8887559.657412, current state
>>stale+active+clean, last acting [4]
>>pg 5.32 is stuck stale for 8887559.657413, current state
>>stale+active+clean, last acting [4]
>>pg 2.34 is stuck stale for 8887559.657416, current state
>>stale+active+clean, last acting [4]
>>pg 0.36 is stuck stale for 8887559.657416, current state
>>stale+active+clean, last acting [4]
>>pg 7.32 is stuck stale for 8887559.657419, current state
>>stale+active+clean, last acting [4]
>>pg 6.33 is stuck stale for 8887559.657420, current state
>>stale+active+clean, last acting [4]
>>pg 0.35 is stuck stale for 8887559.657423, current state
>>stale+active+clean, last acting [4]
>>pg 6.35 is stuck stale for 8887559.657423, current state
>>stale+active+clean, last acting [4]
>>pg 5.36 is stuck stale for 8887559.657424, current state
>>stale+active+clean, last acting [4]
>>pg 2.30 is stuck stale for 8887559.657427, current state
>>stale+active+clean, last acting [4]
>>pg 5.37 is stuck stale for 8887559.657429, current state
>>stale+active+clean, last acting [4]
>>pg 7.36 is stuck stale for 8887559.657430, current state
>>stale+active+clean, last acting [4]
>>pg 6.37 is stuck stale for 8887559.657432, current state
>>stale+active+clean, last acting [4]
>>pg 6.28 is stuck stale for 8887559.657427, current state
>>stale+active+clean, last acting [4]
>>
>>
>>This stays that way and I think this is because when I downloaded and
>>decompiled the crush map I discovered this:
>>@storage1:/tmp# crushtool -d /tmp/crushmap
>># begin crush map
>>tunable choose_local_tries 0
>>tunable choose_local_fallback_tries 0
>>tunable choose_total_tries 50
>>tunable chooseleaf_descend_once 1
>>
>># devices
>>device 0 osd.0
>>device 1 osd.1
>>device 2 osd.2
>>device 3 osd.3
>>device 4 device4
>>device 5 osd.5
>>device 6 osd.6
>>
>>
>>
>>Is there a way to remove this device 4 aka osd.4 from here so ceph can
>>make another copy from the other location shown in ³ceph pg map 2.3a²  ?
>>
>>Regards.
>>
>>Dimitar Boichev
>>SysAdmin Team Lead
>>AXSMarine Sofia
>>Phone: +359 889 22 55 42
>>Skype: dimitar.boichev.axsmarine
>>E-mail:
>>dimitar.boichev@xxxxxxxxxxxxx
>>
>>


________________________________

This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux