Re: ceph auto repair. What is wrong?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 08/24/2018 06:11 AM, Fyodor Ustinov wrote:
> Hi!
> 
> I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two - ssd). Each host located in own rack.
> 
> I make such crush configuration on fresh ceph installation:
> 
>    sudo ceph osd crush add-bucket R-26-3-1 rack
>    sudo ceph osd crush add-bucket R-26-3-2 rack
>    sudo ceph osd crush add-bucket R-26-4-1 rack
>    sudo ceph osd crush add-bucket R-26-4-2 rack
> [...]
>    sudo ceph osd crush add-bucket R-26-8-1 rack
>    sudo ceph osd crush add-bucket R-26-8-2 rack
> 
>    sudo ceph osd crush move R-26-3-1 root=default
> [...]
>    sudo ceph osd crush move R-26-8-2 root=default
> 
>     sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1
> [...]
>     sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2
> 
>     sudo ceph osd crush rule create-replicated hddreplrule default rack hdd
>     sudo ceph osd pool create rbd 256 256 replicated hddreplrule
>     sudo ceph osd pool set rbd size 3
>     sudo ceph osd pool set rbd min_size 2
> 
> osd tree look like:
> ID  CLASS WEIGHT    TYPE NAME               STATUS REWEIGHT PRI-AFF
>  -1       117.36346 root default
>  -2         9.78029     rack R-26-3-1
> -27         9.78029         host S-26-3-1-1
>   0   hdd   9.32390             osd.0           up  1.00000 1.00000
>   1   ssd   0.22820             osd.1           up  1.00000 1.00000
>   2   ssd   0.22820             osd.2           up  1.00000 1.00000
>  -3         9.78029     rack R-26-3-2
> -43         9.78029         host S-26-3-2-1
>   3   hdd   9.32390             osd.3           up  1.00000 1.00000
>   4   ssd   0.22820             osd.4           up  1.00000 1.00000
>   5   ssd   0.22820             osd.5           up  1.00000 1.00000
> [...]
> 
> 
> Now write some data to rbd pool and shutdown one node.
>   cluster:
>     id:     9000d700-8529-4d38-b9f5-24d6079429a2
>     health: HEALTH_WARN
>             3 osds down
>             1 host (3 osds) down
>             1 rack (3 osds) down
>             Degraded data redundancy: 1223/12300 objects degraded (9.943%), 74 pgs degraded, 74 pgs undersized
> 
> And ceph does not try to repair pool. Why?

How long did you wait? The default timeout is 600 seconds before
recovery starts.

These OSDs are not marked as out yet.

Wido

> 
> WBR,
>     Fyodor.
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux