Hi!
Did not help. :(
HEALTH_WARN 3 osds down; 1 host (3 osds) down; 1 rack (3 osds) down;
Degraded data redundancy: 112 pgs undersized
OSD_DOWN 3 osds down
osd.24 (root=default,rack=R-26-7-1,host=S-26-7-1-1) is down
osd.25 (root=default,rack=R-26-7-1,host=S-26-7-1-1) is down
osd.26 (root=default,rack=R-26-7-1,host=S-26-7-1-1) is down
OSD_HOST_DOWN 1 host (3 osds) down
host S-26-7-1-1 (root=default,rack=R-26-7-1) (3 osds) is down
OSD_RACK_DOWN 1 rack (3 osds) down
rack R-26-7-1 (root=default) (3 osds) is down
PG_DEGRADED Degraded data redundancy: 112 pgs undersized
pg 2.0 is stuck undersized for 2466.145928, current state
active+undersized, last acting [18,33]
pg 2.6 is stuck undersized for 2466.144061, current state
active+undersized, last acting [15,18]
pg 2.1b is stuck undersized for 2466.143789, current state
active+undersized, last acting [30,6]
pg 2.22 is stuck undersized for 2466.141138, current state
active+undersized, last acting [15,21]
[....]
[root@S-26-6-1-2 tmp]# ceph config dump
WHO MASK LEVEL OPTION VALUE RO
mon advanced mon_allow_pool_delete true
mon advanced mon_osd_down_out_subtree_limit pod *
On 08/24/18 17:12, Fyodor Ustinov wrote:
Hi!
I.e. I have to do
ceph config set mon mon_osd_down_out_subtree_limit row
and restart every mon?
On 08/24/18 12:44, Paul Emmerich wrote:
Ceph doesn't mark out whole racks by default, set
mon_osd_down_out_subtree_limit to something higher like row or pod.
Paul
2018-08-24 10:50 GMT+02:00 Christian Balzer <chibi@xxxxxxx>:
Hello,
On Fri, 24 Aug 2018 11:30:34 +0300 (EEST) Fyodor Ustinov wrote:
Hi!
I wait about hour.
Aside from verifying those timeout values in your cluster, what's your
mon_osd_down_out_subtree_limit set to?
Christian
----- Original Message -----
From: "Wido den Hollander" <wido@xxxxxxxx>
To: "Fyodor Ustinov" <ufm@xxxxxx>, ceph-users@xxxxxxxxxxxxxx
Sent: Friday, 24 August, 2018 09:52:23
Subject: Re: ceph auto repair. What is wrong?
On 08/24/2018 06:11 AM, Fyodor Ustinov wrote:
Hi!
I have fresh ceph cluster. 12 host and 3 osd on each host (one -
hdd and two - ssd). Each host located in own rack.
I make such crush configuration on fresh ceph installation:
sudo ceph osd crush add-bucket R-26-3-1 rack
sudo ceph osd crush add-bucket R-26-3-2 rack
sudo ceph osd crush add-bucket R-26-4-1 rack
sudo ceph osd crush add-bucket R-26-4-2 rack
[...]
sudo ceph osd crush add-bucket R-26-8-1 rack
sudo ceph osd crush add-bucket R-26-8-2 rack
sudo ceph osd crush move R-26-3-1 root=default
[...]
sudo ceph osd crush move R-26-8-2 root=default
sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1
[...]
sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2
sudo ceph osd crush rule create-replicated hddreplrule default
rack hdd
sudo ceph osd pool create rbd 256 256 replicated hddreplrule
sudo ceph osd pool set rbd size 3
sudo ceph osd pool set rbd min_size 2
osd tree look like:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 117.36346 root default
-2 9.78029 rack R-26-3-1
-27 9.78029 host S-26-3-1-1
0 hdd 9.32390 osd.0 up 1.00000 1.00000
1 ssd 0.22820 osd.1 up 1.00000 1.00000
2 ssd 0.22820 osd.2 up 1.00000 1.00000
-3 9.78029 rack R-26-3-2
-43 9.78029 host S-26-3-2-1
3 hdd 9.32390 osd.3 up 1.00000 1.00000
4 ssd 0.22820 osd.4 up 1.00000 1.00000
5 ssd 0.22820 osd.5 up 1.00000 1.00000
[...]
Now write some data to rbd pool and shutdown one node.
cluster:
id: 9000d700-8529-4d38-b9f5-24d6079429a2
health: HEALTH_WARN
3 osds down
1 host (3 osds) down
1 rack (3 osds) down
Degraded data redundancy: 1223/12300 objects degraded
(9.943%), 74 pgs degraded, 74 pgs undersized
And ceph does not try to repair pool. Why?
How long did you wait? The default timeout is 600 seconds before
recovery starts.
These OSDs are not marked as out yet.
Wido
WBR,
Fyodor.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Christian Balzer Network/Systems Engineer
chibi@xxxxxxx Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com