Hi, if I shutdown an OSD, the OSD gets marked down after 20 seconds, after 300 seconds the osd should get marked out, an the cluster should resync. But that doesn't happened, the OSD stays in the status down/in forever, therefore the cluster stays forever degraded. I can reproduce it with a new installed cluster. If I manually set the osd out (ceph osd out 1), the cluster resync starts immediately. I think thats a release critical bug, because the cluster health is not automatically recovered. And I reported this behavior a while ago http://article.gmane.org/gmane.comp.file-systems.ceph.user/603/ -martin Log: root@store1:~# ceph -s health HEALTH_OK monmap e1: 3 mons at {a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0}, election epoch 82, quorum 0,1,2 a,b,c osdmap e204: 24 osds: 24 up, 24 in pgmap v106709: 5056 pgs: 5056 active+clean; 526 GB data, 1068 GB used, 173 TB / 174 TB avail mdsmap e1: 0/0/1 up root@store1:~# ceph --version ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c) root@store1:~# /etc/init.d/ceph stop osd.1 === osd.1 === Stopping Ceph osd.1 on store1...bash: warning: setlocale: LC_ALL: cannot change locale (en_GB.utf8) kill 5492...done root@store1:~# ceph -s health HEALTH_OK monmap e1: 3 mons at {a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0}, election epoch 82, quorum 0,1,2 a,b,c osdmap e204: 24 osds: 24 up, 24 in pgmap v106709: 5056 pgs: 5056 active+clean; 526 GB data, 1068 GB used, 173 TB / 174 TB avail mdsmap e1: 0/0/1 up root@store1:~# date -R Thu, 25 Apr 2013 13:09:54 +0200 root@store1:~# ceph -s && date -R health HEALTH_WARN 423 pgs degraded; 423 pgs stuck unclean; recovery 10999/269486 degraded (4.081%); 1/24 in osds are down monmap e1: 3 mons at {a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0}, election epoch 82, quorum 0,1,2 a,b,c osdmap e206: 24 osds: 23 up, 24 in pgmap v106715: 5056 pgs: 4633 active+clean, 423 active+degraded; 526 GB data, 1068 GB used, 173 TB / 174 TB avail; 10999/269486 degraded (4.081%) mdsmap e1: 0/0/1 up Thu, 25 Apr 2013 13:10:14 +0200 root@store1:~# ceph -s && date -R health HEALTH_WARN 423 pgs degraded; 423 pgs stuck unclean; recovery 10999/269486 degraded (4.081%); 1/24 in osds are down monmap e1: 3 mons at {a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0}, election epoch 82, quorum 0,1,2 a,b,c osdmap e206: 24 osds: 23 up, 24 in pgmap v106719: 5056 pgs: 4633 active+clean, 423 active+degraded; 526 GB data, 1068 GB used, 173 TB / 174 TB avail; 10999/269486 degraded (4.081%) mdsmap e1: 0/0/1 up Thu, 25 Apr 2013 13:23:01 +0200 On 25.04.2013 01:46, Sage Weil wrote: > Hi everyone- > > We are down to a handful of urgent bugs (3!) and a cuttlefish release date > that is less than a week away. Thank you to everyone who has been > involved in coding, testing, and stabilizing this release. We are close! > > If you would like to test the current release candidate, your efforts > would be much appreciated! For deb systems, you can do > > wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/autobuild.asc' | sudo apt-key add - > echo deb http://gitbuilder.ceph.com/ceph-deb-$(lsb_release -sc)-x86_64-basic/ref/next $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list > > For rpm users you can find packages at > > http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/ref/next/ > http://gitbuilder.ceph.com/ceph-rpm-fc17-x86_64-basic/ref/next/ > http://gitbuilder.ceph.com/ceph-rpm-fc18-x86_64-basic/ref/next/ > > A draft of the release notes is up at > > http://ceph.com/docs/master/release-notes/#v0-61 > > Let me know if I've missed anything! > > sage > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com