Re: osd not removed from crush map after ceph osd crush remove

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Thank you Bryan.

 

I was just trying to upgrade to hammer or upper but before that I was wanting to get the cluster in Healthy state.

Do you think it is safe to upgrade now first to latest firefly then to Hammer ?

 

 

Regards.

 

Dimitar Boichev

SysAdmin Team Lead

AXSMarine Sofia

Phone: +359 889 22 55 42

Skype: dimitar.boichev.axsmarine

E-mail: dimitar.boichev@xxxxxxxxxxxxx

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Stillwell, Bryan
Sent: Tuesday, February 23, 2016 1:51 AM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re: osd not removed from crush map after ceph osd crush remove

 

Dimitar,

 

I'm not sure why those PGs would be stuck in the stale+active+clean state.  Maybe try upgrading to the 0.80.11 release to see if it's a bug that was fixed already?  You can use the 'ceph tell osd.* version' command after the upgrade to make sure all OSDs are running the new version.  Also since firefly (0.80.x) is near its EOL, you should consider upgrading to hammer (0.94.x). 

 

As for why osd.4 didn't get fully removed, the last command you ran isn't correct.  It should be 'ceph osd rm 4'.  Trying to remember when to use the CRUSH name (osd.4) versus the OSD number (4) can be a pain.

 

Bryan

 

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Dimitar Boichev <Dimitar.Boichev@xxxxxxxxxxxxx>
Date: Monday, February 22, 2016 at 1:10 AM
To: Dimitar Boichev <Dimitar.Boichev@xxxxxxxxxxxxx>, "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
Subject: Re: osd not removed from crush map after ceph osd crush remove

 

Anyone ?

 

Regards.

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Dimitar Boichev
Sent: Thursday, February 18, 2016 5:06 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject: osd not removed from crush map after ceph osd crush remove

 

Hello,

I am running a tiny cluster of 2 nodes.

ceph -v

ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)

 

One osd died and I added a new osd (not replacing the old one).

After that I wanted to remove the failed osd completely from the cluster.

Here is what I did:

ceph osd reweight osd.4 0.0

ceph osd crush reweight osd.4 0.0

ceph osd out osd.4

ceph osd crush remove osd.4

ceph auth del osd.4

ceph osd rm osd.4

 

 

But after the rebalancing I ended up with 155 PGs in stale+active+clean  state.

 

@storage1:/tmp# ceph -s

    cluster 7a9120b9-df42-4308-b7b1-e1f3d0f1e7b3

     health HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests are blocked > 32 sec; nodeep-scrub flag(s) set

     monmap e1: 1 mons at {storage1=192.168.10.3:6789/0}, election epoch 1, quorum 0 storage1

     osdmap e1064: 6 osds: 6 up, 6 in

            flags nodeep-scrub

      pgmap v26760322: 712 pgs, 8 pools, 532 GB data, 155 kobjects

            1209 GB used, 14210 GB / 15419 GB avail

                 155 stale+active+clean

                 557 active+clean

  client io 91925 B/s wr, 5 op/s

 

I know about the 1 monitor problem I just want to fix the cluster to healthy state then I will add the third storage node and go up to 3 monitors.

 

The problem is as follows:

@storage1:/tmp# ceph pg map 2.3a

osdmap e1064 pg 2.3a (2.3a) -> up [6] acting [6]

@storage1:/tmp# ceph pg 2.3a query

Error ENOENT: i don't have pgid 2.3a

 

 

@storage1:/tmp# ceph health detail

HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests are blocked > 32 sec; 1 osds have slow requests; nodeep-scrub flag(s) set

pg 7.2a is stuck stale for 8887559.656879, current state stale+active+clean, last acting [4]

pg 5.28 is stuck stale for 8887559.656886, current state stale+active+clean, last acting [4]

pg 7.2b is stuck stale for 8887559.656889, current state stale+active+clean, last acting [4]

pg 7.2c is stuck stale for 8887559.656892, current state stale+active+clean, last acting [4]

pg 0.2b is stuck stale for 8887559.656893, current state stale+active+clean, last acting [4]

pg 6.2c is stuck stale for 8887559.656894, current state stale+active+clean, last acting [4]

pg 6.2f is stuck stale for 8887559.656893, current state stale+active+clean, last acting [4]

pg 2.2b is stuck stale for 8887559.656896, current state stale+active+clean, last acting [4]

pg 2.25 is stuck stale for 8887559.656896, current state stale+active+clean, last acting [4]

pg 6.20 is stuck stale for 8887559.656898, current state stale+active+clean, last acting [4]

pg 5.21 is stuck stale for 8887559.656898, current state stale+active+clean, last acting [4]

pg 0.24 is stuck stale for 8887559.656904, current state stale+active+clean, last acting [4]

pg 2.21 is stuck stale for 8887559.656904, current state stale+active+clean, last acting [4]

pg 5.27 is stuck stale for 8887559.656906, current state stale+active+clean, last acting [4]

pg 2.23 is stuck stale for 8887559.656908, current state stale+active+clean, last acting [4]

pg 6.26 is stuck stale for 8887559.656909, current state stale+active+clean, last acting [4]

pg 7.27 is stuck stale for 8887559.656913, current state stale+active+clean, last acting [4]

pg 7.18 is stuck stale for 8887559.656914, current state stale+active+clean, last acting [4]

pg 0.1e is stuck stale for 8887559.656914, current state stale+active+clean, last acting [4]

pg 6.18 is stuck stale for 8887559.656919, current state stale+active+clean, last acting [4]

pg 2.1f is stuck stale for 8887559.656919, current state stale+active+clean, last acting [4]

pg 7.1b is stuck stale for 8887559.656922, current state stale+active+clean, last acting [4]

pg 0.1b is stuck stale for 8887559.656919, current state stale+active+clean, last acting [4]

pg 6.1d is stuck stale for 8887559.656925, current state stale+active+clean, last acting [4]

pg 2.18 is stuck stale for 8887559.656920, current state stale+active+clean, last acting [4]

pg 7.1d is stuck stale for 8887559.656926, current state stale+active+clean, last acting [4]

pg 5.1c is stuck stale for 8887559.656921, current state stale+active+clean, last acting [4]

pg 5.1d is stuck stale for 8887559.656920, current state stale+active+clean, last acting [4]

pg 6.11 is stuck stale for 8887559.656922, current state stale+active+clean, last acting [4]

pg 5.13 is stuck stale for 8887559.656919, current state stale+active+clean, last acting [4]

pg 0.16 is stuck stale for 8887559.656924, current state stale+active+clean, last acting [4]

pg 6.10 is stuck stale for 8887559.656928, current state stale+active+clean, last acting [4]

pg 2.17 is stuck stale for 8887559.656927, current state stale+active+clean, last acting [4]

pg 7.12 is stuck stale for 8887559.656932, current state stale+active+clean, last acting [4]

pg 0.12 is stuck stale for 8887559.656929, current state stale+active+clean, last acting [4]

pg 6.14 is stuck stale for 8887559.656935, current state stale+active+clean, last acting [4]

pg 0.11 is stuck stale for 8887559.656932, current state stale+active+clean, last acting [4]

pg 7.16 is stuck stale for 8887559.656936, current state stale+active+clean, last acting [4]

pg 0.10 is stuck stale for 8887559.656936, current state stale+active+clean, last acting [4]

pg 2.d is stuck stale for 8887559.656933, current state stale+active+clean, last acting [4]

pg 6.9 is stuck stale for 8887559.656939, current state stale+active+clean, last acting [4]

pg 7.9 is stuck stale for 8887559.656939, current state stale+active+clean, last acting [4]

pg 0.d is stuck stale for 8887559.656940, current state stale+active+clean, last acting [4]

pg 7.a is stuck stale for 8887559.656944, current state stale+active+clean, last acting [4]

pg 0.c is stuck stale for 8887559.656941, current state stale+active+clean, last acting [4]

pg 2.e is stuck stale for 8887559.656947, current state stale+active+clean, last acting [4]

pg 6.a is stuck stale for 8887559.656953, current state stale+active+clean, last acting [4]

pg 0.b is stuck stale for 8887559.656949, current state stale+active+clean, last acting [4]

pg 2.9 is stuck stale for 8887559.656954, current state stale+active+clean, last acting [4]

pg 5.f is stuck stale for 8887559.656953, current state stale+active+clean, last acting [4]

pg 7.d is stuck stale for 8887559.656958, current state stale+active+clean, last acting [4]

pg 6.f is stuck stale for 8887559.656957, current state stale+active+clean, last acting [4]

pg 3.4 is stuck stale for 8887559.656957, current state stale+active+clean, last acting [4]

pg 5.3 is stuck stale for 8887559.656956, current state stale+active+clean, last acting [4]

pg 2.4 is stuck stale for 8887559.656961, current state stale+active+clean, last acting [4]

pg 6.0 is stuck stale for 8887559.656966, current state stale+active+clean, last acting [4]

pg 3.6 is stuck stale for 8887559.656965, current state stale+active+clean, last acting [4]

pg 3.7 is stuck stale for 8887559.656964, current state stale+active+clean, last acting [4]

pg 2.6 is stuck stale for 8887559.656970, current state stale+active+clean, last acting [4]

pg 0.3 is stuck stale for 8887559.656965, current state stale+active+clean, last acting [4]

pg 5.6 is stuck stale for 8887559.656970, current state stale+active+clean, last acting [4]

pg 7.4 is stuck stale for 8887559.656975, current state stale+active+clean, last acting [4]

pg 3.1 is stuck stale for 8887559.656970, current state stale+active+clean, last acting [4]

pg 6.4 is stuck stale for 8887559.656975, current state stale+active+clean, last acting [4]

pg 5.4 is stuck stale for 8887559.656972, current state stale+active+clean, last acting [4]

pg 2.3 is stuck stale for 8887559.656977, current state stale+active+clean, last acting [4]

pg 5.5 is stuck stale for 8887559.656977, current state stale+active+clean, last acting [4]

pg 3.3 is stuck stale for 8887559.656982, current state stale+active+clean, last acting [4]

pg 5.7a is stuck stale for 8887559.657309, current state stale+active+clean, last acting [4]

pg 6.78 is stuck stale for 8887559.657308, current state stale+active+clean, last acting [4]

pg 5.78 is stuck stale for 8887559.657311, current state stale+active+clean, last acting [4]

pg 5.79 is stuck stale for 8887559.657311, current state stale+active+clean, last acting [4]

pg 6.7c is stuck stale for 8887559.657313, current state stale+active+clean, last acting [4]

pg 7.7e is stuck stale for 8887559.657312, current state stale+active+clean, last acting [4]

pg 6.7e is stuck stale for 8887559.657315, current state stale+active+clean, last acting [4]

pg 7.70 is stuck stale for 8887559.657316, current state stale+active+clean, last acting [4]

pg 6.73 is stuck stale for 8887559.657316, current state stale+active+clean, last acting [4]

pg 5.77 is stuck stale for 8887559.657317, current state stale+active+clean, last acting [4]

pg 5.74 is stuck stale for 8887559.657319, current state stale+active+clean, last acting [4]

pg 5.75 is stuck stale for 8887559.657321, current state stale+active+clean, last acting [4]

pg 7.68 is stuck stale for 8887559.657322, current state stale+active+clean, last acting [4]

pg 6.68 is stuck stale for 8887559.657324, current state stale+active+clean, last acting [4]

pg 7.6b is stuck stale for 8887559.657326, current state stale+active+clean, last acting [4]

pg 6.6d is stuck stale for 8887559.657328, current state stale+active+clean, last acting [4]

pg 5.6e is stuck stale for 8887559.657330, current state stale+active+clean, last acting [4]

pg 6.6c is stuck stale for 8887559.657330, current state stale+active+clean, last acting [4]

pg 7.6f is stuck stale for 8887559.657331, current state stale+active+clean, last acting [4]

pg 7.60 is stuck stale for 8887559.657333, current state stale+active+clean, last acting [4]

pg 6.60 is stuck stale for 8887559.657333, current state stale+active+clean, last acting [4]

pg 7.62 is stuck stale for 8887559.657334, current state stale+active+clean, last acting [4]

pg 6.65 is stuck stale for 8887559.657334, current state stale+active+clean, last acting [4]

pg 7.64 is stuck stale for 8887559.657339, current state stale+active+clean, last acting [4]

pg 5.67 is stuck stale for 8887559.657338, current state stale+active+clean, last acting [4]

pg 7.66 is stuck stale for 8887559.657340, current state stale+active+clean, last acting [4]

pg 6.66 is stuck stale for 8887559.657340, current state stale+active+clean, last acting [4]

pg 7.67 is stuck stale for 8887559.657345, current state stale+active+clean, last acting [4]

pg 6.59 is stuck stale for 8887559.657344, current state stale+active+clean, last acting [4]

pg 7.58 is stuck stale for 8887559.657348, current state stale+active+clean, last acting [4]

pg 6.58 is stuck stale for 8887559.657348, current state stale+active+clean, last acting [4]

pg 7.59 is stuck stale for 8887559.657352, current state stale+active+clean, last acting [4]

pg 6.5b is stuck stale for 8887559.657353, current state stale+active+clean, last acting [4]

pg 5.59 is stuck stale for 8887559.657348, current state stale+active+clean, last acting [4]

pg 6.5a is stuck stale for 8887559.657356, current state stale+active+clean, last acting [4]

pg 5.5e is stuck stale for 8887559.657352, current state stale+active+clean, last acting [4]

pg 6.5d is stuck stale for 8887559.657358, current state stale+active+clean, last acting [4]

pg 6.5f is stuck stale for 8887559.657356, current state stale+active+clean, last acting [4]

pg 7.51 is stuck stale for 8887559.657356, current state stale+active+clean, last acting [4]

pg 7.52 is stuck stale for 8887559.657356, current state stale+active+clean, last acting [4]

pg 7.53 is stuck stale for 8887559.657358, current state stale+active+clean, last acting [4]

pg 6.55 is stuck stale for 8887559.657359, current state stale+active+clean, last acting [4]

pg 7.54 is stuck stale for 8887559.657364, current state stale+active+clean, last acting [4]

pg 6.54 is stuck stale for 8887559.657364, current state stale+active+clean, last acting [4]

pg 6.57 is stuck stale for 8887559.657365, current state stale+active+clean, last acting [4]

pg 7.56 is stuck stale for 8887559.657369, current state stale+active+clean, last acting [4]

pg 5.55 is stuck stale for 8887559.657371, current state stale+active+clean, last acting [4]

pg 7.48 is stuck stale for 8887559.657372, current state stale+active+clean, last acting [4]

pg 6.49 is stuck stale for 8887559.657375, current state stale+active+clean, last acting [4]

pg 5.4a is stuck stale for 8887559.657376, current state stale+active+clean, last acting [4]

pg 6.48 is stuck stale for 8887559.657379, current state stale+active+clean, last acting [4]

pg 7.4a is stuck stale for 8887559.657380, current state stale+active+clean, last acting [4]

pg 6.4a is stuck stale for 8887559.657383, current state stale+active+clean, last acting [4]

pg 6.4d is stuck stale for 8887559.657385, current state stale+active+clean, last acting [4]

pg 7.4d is stuck stale for 8887559.657387, current state stale+active+clean, last acting [4]

pg 6.4c is stuck stale for 8887559.657389, current state stale+active+clean, last acting [4]

pg 6.4e is stuck stale for 8887559.657391, current state stale+active+clean, last acting [4]

pg 5.42 is stuck stale for 8887559.657391, current state stale+active+clean, last acting [4]

pg 6.43 is stuck stale for 8887559.657393, current state stale+active+clean, last acting [4]

pg 5.41 is stuck stale for 8887559.657393, current state stale+active+clean, last acting [4]

pg 5.47 is stuck stale for 8887559.657394, current state stale+active+clean, last acting [4]

pg 7.46 is stuck stale for 8887559.657396, current state stale+active+clean, last acting [4]

pg 6.39 is stuck stale for 8887559.657398, current state stale+active+clean, last acting [4]

pg 5.3a is stuck stale for 8887559.657399, current state stale+active+clean, last acting [4]

pg 2.3e is stuck stale for 8887559.657399, current state stale+active+clean, last acting [4]

pg 0.3c is stuck stale for 8887559.657402, current state stale+active+clean, last acting [4]

pg 7.3c is stuck stale for 8887559.657404, current state stale+active+clean, last acting [4]

pg 7.3d is stuck stale for 8887559.657405, current state stale+active+clean, last acting [4]

pg 0.39 is stuck stale for 8887559.657402, current state stale+active+clean, last acting [4]

pg 5.3c is stuck stale for 8887559.657405, current state stale+active+clean, last acting [4]

pg 2.3a is stuck stale for 8887559.657406, current state stale+active+clean, last acting [4]

pg 0.38 is stuck stale for 8887559.657409, current state stale+active+clean, last acting [4]

pg 2.35 is stuck stale for 8887559.657411, current state stale+active+clean, last acting [4]

pg 0.37 is stuck stale for 8887559.657412, current state stale+active+clean, last acting [4]

pg 5.32 is stuck stale for 8887559.657413, current state stale+active+clean, last acting [4]

pg 2.34 is stuck stale for 8887559.657416, current state stale+active+clean, last acting [4]

pg 0.36 is stuck stale for 8887559.657416, current state stale+active+clean, last acting [4]

pg 7.32 is stuck stale for 8887559.657419, current state stale+active+clean, last acting [4]

pg 6.33 is stuck stale for 8887559.657420, current state stale+active+clean, last acting [4]

pg 0.35 is stuck stale for 8887559.657423, current state stale+active+clean, last acting [4]

pg 6.35 is stuck stale for 8887559.657423, current state stale+active+clean, last acting [4]

pg 5.36 is stuck stale for 8887559.657424, current state stale+active+clean, last acting [4]

pg 2.30 is stuck stale for 8887559.657427, current state stale+active+clean, last acting [4]

pg 5.37 is stuck stale for 8887559.657429, current state stale+active+clean, last acting [4]

pg 7.36 is stuck stale for 8887559.657430, current state stale+active+clean, last acting [4]

pg 6.37 is stuck stale for 8887559.657432, current state stale+active+clean, last acting [4]

pg 6.28 is stuck stale for 8887559.657427, current state stale+active+clean, last acting [4]

 

 

This stays that way and I think this is because when I downloaded and decompiled the crush map I discovered this:

@storage1:/tmp# crushtool -d /tmp/crushmap

# begin crush map

tunable choose_local_tries 0

tunable choose_local_fallback_tries 0

tunable choose_total_tries 50

tunable chooseleaf_descend_once 1

 

# devices

device 0 osd.0

device 1 osd.1

device 2 osd.2

device 3 osd.3

device 4 device4

device 5 osd.5

device 6 osd.6

 

 

 

Is there a way to remove this device 4 aka osd.4 from here so ceph can make another copy from the other location shown in “ceph pg map 2.3a”  ?

 

Regards.

 

Dimitar Boichev

SysAdmin Team Lead

AXSMarine Sofia

Phone: +359 889 22 55 42

Skype: dimitar.boichev.axsmarine

E-mail: dimitar.boichev@xxxxxxxxxxxxx

 

 



This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux