>In the monitor log you sent along, the monitor was crashing on a setcrushmap command. Where in this sequence of events did that happen? It's happened after I try to upload different crushmap, much later step 13. >Where are you getting these numbers 82-84 and 92-94 from? They don't appear in any any of the maps you've sent along... Sorry, this is crushmap after OSDs was broken: https://dl.dropboxusercontent.com/u/2296931/ceph/crushmap14-2.txt >Can you provide us a tarball of one of your monitor directories? https://dl.dropboxusercontent.com/u/2296931/ceph/ceph-mon.1.tar.bz2 2013/7/19 Gregory Farnum <greg@xxxxxxxxxxx>: > In the monitor log you sent along, the monitor was crashing on a > setcrushmap command. Where in this sequence of events did that happen? > > On Wed, Jul 17, 2013 at 5:07 PM, Vladislav Gorbunov <vadikgo@xxxxxxxxx> wrote: >> That's what I did: >> >> cluster state HEALTH_OK >> >> 1. load crush map from cluster: >> https://dl.dropboxusercontent.com/u/2296931/ceph/crushmap1.txt >> 2. modify crush map for adding pool and ruleset iscsi with 2 >> datacenters, upload crush map to cluster: >> https://dl.dropboxusercontent.com/u/2296931/ceph/crushmap2.txt >> >> 3. add host gstore1 >> >> ceph-deploy osd create gstore1:/dev/sdh:/dev/sdb1 >> ceph-deploy osd create gstore1:/dev/sdj:/dev/sdc1 >> ceph-deploy osd create gstore1:/dev/sdk:/dev/sdc2 >> >> 4. move this osds to datacenter datacenter-cod: >> ceph osd crush set 82 0 root=iscsi datacenter=datacenter-cod host=gstore1 >> ceph osd crush set 83 0 root=iscsi datacenter=datacenter-cod host=gstore1 >> ceph osd crush set 84 0 root=iscsi datacenter=datacenter-cod host=gstore1 >> >> 5. cluster state HEALTH_OK, reweight new osds: >> ceph osd crush reweight osd.82 2.73 >> ceph osd crush reweight osd.83 2.73 >> ceph osd crush reweight osd.84 2.73 >> >> 6. exclude osd.57 (in default pool) from cluster: >> ceph osd crush reweight osd.57 0 >> cluster state HEALTH_WARN >> >> 7. add new node gstore2 same as gstore1 >> ceph-deploy -v osd create gstore2:/dev/sdh:/dev/sdb1 >> ceph osd crush set 94 2.73 root=iscsi datacenter=datacenter-rcod host=gstore2 > > Where are you getting these numbers 82-84 and 92-94 from? They don't > appear in any any of the maps you've sent along... > > >> 8. exclude osd.56 (in default pool) from cluster: >> ceph osd crush reweight osd.57 0 >> >> >> 9. add new osd to gstore2 >> ceph-deploy osd create gstore2:/dev/sdl:/dev/sdd1 >> ceph-deploy osd create gstore2:/dev/sdm:/dev/sdd2 >> … >> ceph-deploy osd create gstore2:/dev/sds:/dev/sdg2 >> >> 10. rename pool iscsi in default crush pool : >> ceph osd pool rename iscsi iscsi-old >> >> 11. create new pool iscsi: >> ceph osd pool create iscsi 2048 2048 >> >> 12. set ruleset iscsi to new pool iscsi >> ceph osd pool set iscsi crush_ruleset 3 >> >> All OSDS downed with Segmentation fault > > Okay, so you switched to actually start using the new rule and the > OSDs broke. It's possible there's a hole in our crush map testing that > would let this through. > >> 13. failback ruleset 0 for pool iscsi >> ceph osd pool set iscsi crush_ruleset 0 >> >> delete ruleset iscsi, upload crushmap to cluster >> https://dl.dropboxusercontent.com/u/2296931/ceph/crushmap14-new.txt >> >> OSD still Segmentation fault > > Yeah, once you've put a bad map into the system then you can't fix it > by putting in a good one — all the OSDs need to evaluate the past maps > on startups, which includes the bad one, which makes them crash again. > :( > > Can you provide us a tarball of one of your monitor directories? We'd > like to do some forensics on it to identify the scenario precisely and > prevent it from happening again. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com