On Tue, 9 Sep 2014, yuelongguang wrote: > hi,all > ? > that is crazy. > 1. > all my osds are down, but ceph -s tells they are up and in. why? Peer OSDs normally handle failure detection. If all OSDs are down, there is nobody to report the failures. After 5 or 10 minutes if the OSDs don't report any stats to the monitor it will eventually assume they are dead and mark them down. > 2. > now all osds are down, a vm is using rbd as its disk, and inside?vm? fio is > r/wing the disk , but it hang ,can not be killed. why ? The IOs will block indefinitely until the cluster is available. Once the OSDs are started teh VM will become responsive again. sage > ? > thanks > ? > [root at cephosd2-monb ~]# ceph -v > ceph version 0.81 (8de9501df275a5fe29f2c64cb44f195130e4a8fc) > ? > ?[root at cephosd2-monb ~]# ceph -s > ??? cluster 508634f6-20c9-43bb-bc6f-b777f4bb1651 > ???? health HEALTH_WARN mds 0 is laggy > ???? monmap e13: 3 mons at{cephosd1-mona=10.154.249.3:6789/0,cephosd2-monb=10.154.249.4:6789/0,cephos > d3-monc=10.154.249.5:6789/0}, election epoch 154, quorum 0,1,2 > cephosd1-mona,cephosd2-monb,cephosd3-monc > ???? mdsmap e21: 1/1/1 up {0=0=up:active(laggy or crashed)} > ???? osdmap e196: 5 osds: 5 up, 5 in > ????? pgmap v21836: 512 pgs, 5 pools, 3115 MB data, 805 objects > ??????????? 9623 MB used, 92721 MB / 102344 MB avail > ???????????????? 512 active+clean > > > >