Re: cuttlefish countdown -- OSD doesn't get marked out

Martin Mailand <martin@xxxxxxxxxxxx> · Thu, 25 Apr 2013 14:07:31 +0200

Hi,

if I shutdown an OSD, the OSD gets marked down after 20 seconds, after
300 seconds the osd should get marked out, an the cluster should resync.
But that doesn't happened, the OSD stays in the status down/in forever,
therefore the cluster stays forever degraded.
I can reproduce it with a new installed cluster.

If I manually set the osd out (ceph osd out 1), the cluster resync
starts immediately.

I think thats a release critical bug, because the cluster health is not
automatically recovered.

And I reported this behavior a while ago
http://article.gmane.org/gmane.comp.file-systems.ceph.user/603/

-martin

Log:

root@store1:~# ceph -s
   health HEALTH_OK
   monmap e1: 3 mons at
{a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0},
election epoch 82, quorum 0,1,2 a,b,c
   osdmap e204: 24 osds: 24 up, 24 in
    pgmap v106709: 5056 pgs: 5056 active+clean; 526 GB data, 1068 GB
used, 173 TB / 174 TB avail
   mdsmap e1: 0/0/1 up

root@store1:~# ceph --version
ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
root@store1:~# /etc/init.d/ceph stop osd.1
=== osd.1 ===
Stopping Ceph osd.1 on store1...bash: warning: setlocale: LC_ALL: cannot
change locale (en_GB.utf8)
kill 5492...done
root@store1:~# ceph -s
   health HEALTH_OK
   monmap e1: 3 mons at
{a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0},
election epoch 82, quorum 0,1,2 a,b,c
   osdmap e204: 24 osds: 24 up, 24 in
    pgmap v106709: 5056 pgs: 5056 active+clean; 526 GB data, 1068 GB
used, 173 TB / 174 TB avail
   mdsmap e1: 0/0/1 up

root@store1:~# date -R
Thu, 25 Apr 2013 13:09:54 +0200

root@store1:~# ceph -s && date -R
   health HEALTH_WARN 423 pgs degraded; 423 pgs stuck unclean; recovery
10999/269486 degraded (4.081%); 1/24 in osds are down
   monmap e1: 3 mons at
{a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0},
election epoch 82, quorum 0,1,2 a,b,c
   osdmap e206: 24 osds: 23 up, 24 in
    pgmap v106715: 5056 pgs: 4633 active+clean, 423 active+degraded; 526
GB data, 1068 GB used, 173 TB / 174 TB avail; 10999/269486 degraded (4.081%)
   mdsmap e1: 0/0/1 up

Thu, 25 Apr 2013 13:10:14 +0200

root@store1:~# ceph -s && date -R
   health HEALTH_WARN 423 pgs degraded; 423 pgs stuck unclean; recovery
10999/269486 degraded (4.081%); 1/24 in osds are down
   monmap e1: 3 mons at
{a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0},
election epoch 82, quorum 0,1,2 a,b,c
   osdmap e206: 24 osds: 23 up, 24 in
    pgmap v106719: 5056 pgs: 4633 active+clean, 423 active+degraded; 526
GB data, 1068 GB used, 173 TB / 174 TB avail; 10999/269486 degraded (4.081%)
   mdsmap e1: 0/0/1 up

Thu, 25 Apr 2013 13:23:01 +0200

On 25.04.2013 01:46, Sage Weil wrote:
> Hi everyone-
> 
> We are down to a handful of urgent bugs (3!) and a cuttlefish release date 
> that is less than a week away.  Thank you to everyone who has been 
> involved in coding, testing, and stabilizing this release.  We are close!
> 
> If you would like to test the current release candidate, your efforts 
> would be much appreciated!  For deb systems, you can do
> 
>  wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/autobuild.asc' | sudo apt-key add -
>  echo deb http://gitbuilder.ceph.com/ceph-deb-$(lsb_release -sc)-x86_64-basic/ref/next $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
> 
> For rpm users you can find packages at
> 
>  http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/ref/next/
>  http://gitbuilder.ceph.com/ceph-rpm-fc17-x86_64-basic/ref/next/
>  http://gitbuilder.ceph.com/ceph-rpm-fc18-x86_64-basic/ref/next/
> 
> A draft of the release notes is up at
> 
>  http://ceph.com/docs/master/release-notes/#v0-61
> 
> Let me know if I've missed anything!
> 
> sage
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com