That's what I did: cluster state HEALTH_OK 1. load crush map from cluster: https://dl.dropboxusercontent.com/u/2296931/ceph/crushmap1.txt 2. modify crush map for adding pool and ruleset iscsi with 2 datacenters, upload crush map to cluster: https://dl.dropboxusercontent.com/u/2296931/ceph/crushmap2.txt 3. add host gstore1 ceph-deploy osd create gstore1:/dev/sdh:/dev/sdb1 ceph-deploy osd create gstore1:/dev/sdj:/dev/sdc1 ceph-deploy osd create gstore1:/dev/sdk:/dev/sdc2 4. move this osds to datacenter datacenter-cod: ceph osd crush set 82 0 root=iscsi datacenter=datacenter-cod host=gstore1 ceph osd crush set 83 0 root=iscsi datacenter=datacenter-cod host=gstore1 ceph osd crush set 84 0 root=iscsi datacenter=datacenter-cod host=gstore1 5. cluster state HEALTH_OK, reweight new osds: ceph osd crush reweight osd.82 2.73 ceph osd crush reweight osd.83 2.73 ceph osd crush reweight osd.84 2.73 6. exclude osd.57 (in default pool) from cluster: ceph osd crush reweight osd.57 0 cluster state HEALTH_WARN 7. add new node gstore2 same as gstore1 ceph-deploy -v osd create gstore2:/dev/sdh:/dev/sdb1 ceph osd crush set 94 2.73 root=iscsi datacenter=datacenter-rcod host=gstore2 8. exclude osd.56 (in default pool) from cluster: ceph osd crush reweight osd.57 0 9. add new osd to gstore2 ceph-deploy osd create gstore2:/dev/sdl:/dev/sdd1 ceph-deploy osd create gstore2:/dev/sdm:/dev/sdd2 … ceph-deploy osd create gstore2:/dev/sds:/dev/sdg2 10. rename pool iscsi in default crush pool : ceph osd pool rename iscsi iscsi-old 11. create new pool iscsi: ceph osd pool create iscsi 2048 2048 12. set ruleset iscsi to new pool iscsi ceph osd pool set iscsi crush_ruleset 3 All OSDS downed with Segmentation fault 13. failback ruleset 0 for pool iscsi ceph osd pool set iscsi crush_ruleset 0 delete ruleset iscsi, upload crushmap to cluster https://dl.dropboxusercontent.com/u/2296931/ceph/crushmap14-new.txt OSD still Segmentation fault 2013/7/18 Gregory Farnum <greg@xxxxxxxxxxx>: > On Wed, Jul 17, 2013 at 4:40 AM, Vladislav Gorbunov <vadikgo@xxxxxxxxx> wrote: >> Sorry, not send to ceph-users later. >> >> I check mon.1 log and found that cluster was not in HEALTH_OK when set >> ruleset to iscsi: >> 2013-07-14 15:52:15.715871 7fe8a852a700 0 log [INF] : pgmap >> v16861121: 19296 pgs: 19052 active+clean, 73 >> active+remapped+wait_backfill, 171 active+remapped+b >> ackfilling; 9023 GB data, 18074 GB used, 95096 GB / 110 TB avail; >> 21245KB/s rd, 1892KB/s wr, 443op/s; 49203/4696557 degraded (1.048%) >> 2 >> 2013-07-14 15:52:15.870389 7fe8a852a700 0 mon.1@0(leader) e23 >> handle_command mon_command(osd pool set iscsi crush_ruleset 3 v 0) v1 >> ... >> 2013-07-14 15:52:35.930465 7fe8a852a700 1 mon.1@0(leader).osd e77415 >> prepare_failure osd.2 10.166.10.27:6801/12007 from osd.56 >> 10.166.10.29:6896/18516 is reporting failure:1 >> 2013-07-14 15:52:35.930641 7fe8a852a700 0 log [DBG] : osd.2 >> 10.166.10.27:6801/12007 reported failed by osd.56 >> 10.166.10.29:6896/18516 >> > > Okay, I think you need to back up and provide a simple timeline of > what you did and what you know about the cluster state at that time. > I'm particularly interested about anything you did after the OSDs > started crashing, but I want to know about what happened before as > well. > >> Could this be an indicator of distribution the bad map to cluster's >> osd servers by osd.56? This means that you can not change the crushmap >> of the cluster if it not in HEALTH_OK or you lost all cluster? > > You can absolutely change the crush map on a cluster which is in an > unhealthy state. That's not the problem, at least not on its own. > >> full log at https://dl.dropboxusercontent.com/u/2296931/ceph/ceph-mon.1.log.bak.zip >> (1.7MB) >> >>>If a bad map somehow got distributed to the OSDs then cleaning it up >> is unfortunately going to take a lot of work without any well-defined >> processes. >> This means that all data was lost? > > *If* that actually happened somehow (it shouldn't be able to happen, > generally), then depending on how much time and money you are willing > to invest you might have lost it, yes. > > As I look at your monitor log, it looks like it crashed whenever you > tried to inject the crush map at about 2013-07-14 16:54:57. Is that > when your OSDs started crashing, or was something wrong with them > before that? > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com