Ceph doesn't mark out whole racks by default, set mon_osd_down_out_subtree_limit to something higher like row or pod. Paul 2018-08-24 10:50 GMT+02:00 Christian Balzer <chibi@xxxxxxx>: > Hello, > > On Fri, 24 Aug 2018 11:30:34 +0300 (EEST) Fyodor Ustinov wrote: > >> Hi! >> >> I wait about hour. >> > Aside from verifying those timeout values in your cluster, what's your > mon_osd_down_out_subtree_limit set to? > > Christian > >> ----- Original Message ----- >> From: "Wido den Hollander" <wido@xxxxxxxx> >> To: "Fyodor Ustinov" <ufm@xxxxxx>, ceph-users@xxxxxxxxxxxxxx >> Sent: Friday, 24 August, 2018 09:52:23 >> Subject: Re: ceph auto repair. What is wrong? >> >> On 08/24/2018 06:11 AM, Fyodor Ustinov wrote: >> > Hi! >> > >> > I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two - ssd). Each host located in own rack. >> > >> > I make such crush configuration on fresh ceph installation: >> > >> > sudo ceph osd crush add-bucket R-26-3-1 rack >> > sudo ceph osd crush add-bucket R-26-3-2 rack >> > sudo ceph osd crush add-bucket R-26-4-1 rack >> > sudo ceph osd crush add-bucket R-26-4-2 rack >> > [...] >> > sudo ceph osd crush add-bucket R-26-8-1 rack >> > sudo ceph osd crush add-bucket R-26-8-2 rack >> > >> > sudo ceph osd crush move R-26-3-1 root=default >> > [...] >> > sudo ceph osd crush move R-26-8-2 root=default >> > >> > sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1 >> > [...] >> > sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2 >> > >> > sudo ceph osd crush rule create-replicated hddreplrule default rack hdd >> > sudo ceph osd pool create rbd 256 256 replicated hddreplrule >> > sudo ceph osd pool set rbd size 3 >> > sudo ceph osd pool set rbd min_size 2 >> > >> > osd tree look like: >> > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >> > -1 117.36346 root default >> > -2 9.78029 rack R-26-3-1 >> > -27 9.78029 host S-26-3-1-1 >> > 0 hdd 9.32390 osd.0 up 1.00000 1.00000 >> > 1 ssd 0.22820 osd.1 up 1.00000 1.00000 >> > 2 ssd 0.22820 osd.2 up 1.00000 1.00000 >> > -3 9.78029 rack R-26-3-2 >> > -43 9.78029 host S-26-3-2-1 >> > 3 hdd 9.32390 osd.3 up 1.00000 1.00000 >> > 4 ssd 0.22820 osd.4 up 1.00000 1.00000 >> > 5 ssd 0.22820 osd.5 up 1.00000 1.00000 >> > [...] >> > >> > >> > Now write some data to rbd pool and shutdown one node. >> > cluster: >> > id: 9000d700-8529-4d38-b9f5-24d6079429a2 >> > health: HEALTH_WARN >> > 3 osds down >> > 1 host (3 osds) down >> > 1 rack (3 osds) down >> > Degraded data redundancy: 1223/12300 objects degraded (9.943%), 74 pgs degraded, 74 pgs undersized >> > >> > And ceph does not try to repair pool. Why? >> >> How long did you wait? The default timeout is 600 seconds before >> recovery starts. >> >> These OSDs are not marked as out yet. >> >> Wido >> >> > >> > WBR, >> > Fyodor. >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Rakuten Communications > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com