Hello, On Fri, 24 Aug 2018 11:30:34 +0300 (EEST) Fyodor Ustinov wrote: > Hi! > > I wait about hour. > Aside from verifying those timeout values in your cluster, what's your mon_osd_down_out_subtree_limit set to? Christian > ----- Original Message ----- > From: "Wido den Hollander" <wido@xxxxxxxx> > To: "Fyodor Ustinov" <ufm@xxxxxx>, ceph-users@xxxxxxxxxxxxxx > Sent: Friday, 24 August, 2018 09:52:23 > Subject: Re: ceph auto repair. What is wrong? > > On 08/24/2018 06:11 AM, Fyodor Ustinov wrote: > > Hi! > > > > I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two - ssd). Each host located in own rack. > > > > I make such crush configuration on fresh ceph installation: > > > > sudo ceph osd crush add-bucket R-26-3-1 rack > > sudo ceph osd crush add-bucket R-26-3-2 rack > > sudo ceph osd crush add-bucket R-26-4-1 rack > > sudo ceph osd crush add-bucket R-26-4-2 rack > > [...] > > sudo ceph osd crush add-bucket R-26-8-1 rack > > sudo ceph osd crush add-bucket R-26-8-2 rack > > > > sudo ceph osd crush move R-26-3-1 root=default > > [...] > > sudo ceph osd crush move R-26-8-2 root=default > > > > sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1 > > [...] > > sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2 > > > > sudo ceph osd crush rule create-replicated hddreplrule default rack hdd > > sudo ceph osd pool create rbd 256 256 replicated hddreplrule > > sudo ceph osd pool set rbd size 3 > > sudo ceph osd pool set rbd min_size 2 > > > > osd tree look like: > > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > > -1 117.36346 root default > > -2 9.78029 rack R-26-3-1 > > -27 9.78029 host S-26-3-1-1 > > 0 hdd 9.32390 osd.0 up 1.00000 1.00000 > > 1 ssd 0.22820 osd.1 up 1.00000 1.00000 > > 2 ssd 0.22820 osd.2 up 1.00000 1.00000 > > -3 9.78029 rack R-26-3-2 > > -43 9.78029 host S-26-3-2-1 > > 3 hdd 9.32390 osd.3 up 1.00000 1.00000 > > 4 ssd 0.22820 osd.4 up 1.00000 1.00000 > > 5 ssd 0.22820 osd.5 up 1.00000 1.00000 > > [...] > > > > > > Now write some data to rbd pool and shutdown one node. > > cluster: > > id: 9000d700-8529-4d38-b9f5-24d6079429a2 > > health: HEALTH_WARN > > 3 osds down > > 1 host (3 osds) down > > 1 rack (3 osds) down > > Degraded data redundancy: 1223/12300 objects degraded (9.943%), 74 pgs degraded, 74 pgs undersized > > > > And ceph does not try to repair pool. Why? > > How long did you wait? The default timeout is 600 seconds before > recovery starts. > > These OSDs are not marked as out yet. > > Wido > > > > > WBR, > > Fyodor. > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Rakuten Communications _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com