Re: Cluster Map Problems

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 28 Mar 2013 10:44:48 -0700



Looks like you either have a custom config, or have specified
somewhere that OSDs shouldn't be marked out. (ie, setting the 'noout'
flag). There can also be a bit of flux if your OSDs are reporting an
unusual number of failures, but you'd have seen failure reports if
that were going on.
-Greg

On Thu, Mar 28, 2013 at 10:35 AM, Martin Mailand <martin@xxxxxxxxxxxx> wrote:
> Hi Greg,
>
>  /etc/init.d/ceph stop osd.1
> === osd.1 ===
> Stopping Ceph osd.1 on store1...kill 13413...done
> root@store1:~# date -R
> Thu, 28 Mar 2013 18:22:05 +0100
> root@store1:~# ceph -s
>    health HEALTH_WARN 378 pgs degraded; 378 pgs stuck unclean; recovery
> 39/904 degraded (4.314%);  recovering 15E o/s, 15EB/s; 1/24 in osds are down
>    monmap e1: 3 mons at
> {a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0},
> election epoch 6, quorum 0,1,2 a,b,c
>    osdmap e28: 24 osds: 23 up, 24 in
>     pgmap v449: 4800 pgs: 4422 active+clean, 378 active+degraded; 1800
> MB data, 3800 MB used, 174 TB / 174 TB avail; 39/904 degraded (4.314%);
>  recovering 15E o/s, 15EB/s
>    mdsmap e1: 0/0/1 up
>
>
> 10 mins later, still the same
>
> root@store1:~# date -R
> Thu, 28 Mar 2013 18:32:24 +0100
> root@store1:~# ceph -s
>    health HEALTH_WARN 378 pgs degraded; 378 pgs stuck unclean; recovery
> 39/904 degraded (4.314%); 1/24 in osds are down
>    monmap e1: 3 mons at
> {a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0},
> election epoch 6, quorum 0,1,2 a,b,c
>    osdmap e28: 24 osds: 23 up, 24 in
>     pgmap v454: 4800 pgs: 4422 active+clean, 378 active+degraded; 1800
> MB data, 3780 MB used, 174 TB / 174 TB avail; 39/904 degraded (4.314%)
>    mdsmap e1: 0/0/1 up
>
> root@store1:~#
>
>
> -martin
>
> On 28.03.2013 16:38, Gregory Farnum wrote:
>> This is the perfectly normal distinction between "down" and "out". The
>> OSD has been marked down but there's a timeout period (default: 5
>> minutes) before it's marked "out" and the data gets reshuffled (to
>> avoid starting replication on a simple reboot, for instance).
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com