> Op 26 april 2016 om 19:39 schreef Samuel Just <sjust@xxxxxxxxxx>: > > > I think? Probably worth reproducing on a vstart cluster to validate > the fix. Didn't we introduce something in the mon to validate new > crushmaps? Hammer maybe? I ended up injecting a fixed CRUSHMap into osdmap 1432 and 1433 on this cluster. For future reference: $ for i in {1392..1450}; do find -name "osdmap*${i}*" -exec osdmaptool --export-crush /tmp/crush.${i} {} \;; crushtool -d /tmp/crush.${i} -o /tmp/crush.${i}.txt; done That is how I extracted the CRUSHMap out of the OSDMap. Tracing the logs I found that it went wrong somewhere around OSDMap 1430. I inspected those maps manually and traced the root-cause to 1432 and 1433. I fixed the CRUSHmap and compiled it again: $ crushtool -c /tmp/crush.1432.txt -o /tmp/crush.1432.new Afterwards I injected it again $ find /var/lib/ceph/osd/ceph-*/current/meta -name 'osdmap.1432*'|xargs -n 1 osdmaptool --import-crush /tmp/crush.1432.new The OSDs now start and keep running. The cluster isn't stable yet, but I at least have the OSDs back up and running. Wido > -Sam > > On Tue, Apr 26, 2016 at 8:09 AM, Wido den Hollander <wido@xxxxxxxx> wrote: > > > >> Op 26 april 2016 om 16:58 schreef Samuel Just <sjust@xxxxxxxxxx>: > >> > >> > >> Can you attach the OSDMap (ceph osd getmap -o <mapfile>)? > >> -Sam > >> > > > > Henrik contacted me to look at this and this is what I found: > > > > 0x0000000000b18b81 in crush_choose_firstn (map=map@entry=0x1f00200, bucket=0x0, weight=weight@entry=0x1f2b880, weight_max=weight_max@entry=30, x=x@entry=1731224833, numrep=2, type=1, out=0x7fffdc036508, outpos=0, out_size=2, tries=51, recurse_tries=1, local_retries=0, > > local_fallback_retries=0, recurse_to_leaf=1, vary_r=0, out2=0x7fffdc036510, parent_r=0) at crush/mapper.c:345 > > 345 crush/mapper.c: No such file or directory. > > > > A bit more output from GDB: > > > > #0 0x0000000000b18b81 in crush_choose_firstn (map=map@entry=0x1f00200, bucket=0x0, weight=weight@entry=0x1f2b880, weight_max=weight_max@entry=30, x=x@entry=1731224833, numrep=2, type=1, out=0x7fffdc036508, outpos=0, out_size=2, tries=51, recurse_tries=1, local_retries=0, > > local_fallback_retries=0, recurse_to_leaf=1, vary_r=0, out2=0x7fffdc036510, parent_r=0) at crush/mapper.c:345 > > #1 0x0000000000b194cb in crush_do_rule (map=0x1f00200, ruleno=<optimized out>, x=1731224833, result=0x7fffdc036520, result_max=<optimized out>, weight=0x1f2b880, weight_max=30, scratch=<optimized out>) at crush/mapper.c:794 > > #2 0x0000000000a61680 in do_rule (weight=std::vector of length 30, capacity 30 = {...}, maxout=2, out=std::vector of length 0, capacity 0, x=1731224833, rule=0, this=0x1f72340) at ./crush/CrushWrapper.h:939 > > #3 OSDMap::_pg_to_osds (this=this@entry=0x1f46800, pool=..., pg=..., osds=osds@entry=0x7fffdc036600, primary=primary@entry=0x7fffdc0365ec, ppps=0x7fffdc0365f4) at osd/OSDMap.cc:1417 > > > > It seems that CRUSH can't find entries in the CRUSHMap. In this case the 'root default' was removed while the default ruleset still refers to it. > > > > The cluster is running 0.80.11 > > > > I extracted the CRUSHMaps from the OSDMaps on osd.0: > > > > $ for i in {1392..1450}; do find -name "osdmap*${i}*" -exec osdmaptool --export-crush /tmp/crush.${i} {} \;; crushtool -d /tmp/crush.${i} -o /tmp/crush.${i}.txt; done > > > > Here I see that in map 1433 the root 'default' doesn't exist, but the crush ruleset refers to 'bucket0'. This crushmap is attached. > > > > rule replicated_ruleset { > > ruleset 0 > > type replicated > > min_size 1 > > max_size 10 > > step take bucket0 > > step chooseleaf firstn 0 type host > > step emit > > } > > > > The root bucket0 doesn't exist. > > > > bucket0 seems like something which was created by Ceph/CRUSH and not by the user. > > > > I'm thinking about injecting a fixed CRUSHMap into this OSDMap where bucket0 does exist. Does that seem like a sane thing to do? > > > > Wido > > > > > >> On Tue, Apr 26, 2016 at 2:07 AM, Henrik Svensson <henrik.svensson@xxxxxxxxxx > >> > wrote: > >> > >> > Hi! > >> > > >> > We got a three node CEPH cluster with 10 OSD each. > >> > > >> > We bought 3 new machines with additional 30 disks that should reside in > >> > another location. > >> > Before adding these machines we modified the default CRUSH table. > >> > > >> > After modifying the (default) crush table with these commands the cluster > >> > went down: > >> > > >> > ———————————————— > >> > ceph osd crush add-bucket dc1 datacenter > >> > ceph osd crush add-bucket dc2 datacenter > >> > ceph osd crush add-bucket availo datacenter > >> > ceph osd crush move dc1 root=default > >> > ceph osd crush move lkpsx0120 root=default datacenter=dc1 > >> > ceph osd crush move lkpsx0130 root=default datacenter=dc1 > >> > ceph osd crush move lkpsx0140 root=default datacenter=dc1 > >> > ceph osd crush move dc2 root=default > >> > ceph osd crush move availo root=default > >> > ceph osd crush add-bucket sectra root > >> > ceph osd crush move dc1 root=sectra > >> > ceph osd crush move dc2 root=sectra > >> > ceph osd crush move dc3 root=sectra > >> > ceph osd crush move availo root=sectra > >> > ceph osd crush remove default > >> > ———————————————— > >> > > >> > I tried to revert the CRUSH map but no luck: > >> > > >> > ———————————————— > >> > ceph osd crush add-bucket default root > >> > ceph osd crush move lkpsx0120 root=default > >> > ceph osd crush move lkpsx0130 root=default > >> > ceph osd crush move lkpsx0140 root=default > >> > ceph osd crush remove sectra > >> > ———————————————— > >> > > >> > After trying to restart the cluster (and even the machines) no OSD started > >> > up again. > >> > But ceph osd tree gave this output, stating certain OSD:s are up (but the > >> > processes are not running): > >> > > >> > ———————————————— > >> > # id weight type name up/down reweight > >> > -1 163.8 root default > >> > -2 54.6 host lkpsx0120 > >> > 0 5.46 osd.0 down 0 > >> > 1 5.46 osd.1 down 0 > >> > 2 5.46 osd.2 down 0 > >> > 3 5.46 osd.3 down 0 > >> > 4 5.46 osd.4 down 0 > >> > 5 5.46 osd.5 down 0 > >> > 6 5.46 osd.6 down 0 > >> > 7 5.46 osd.7 down 0 > >> > 8 5.46 osd.8 down 0 > >> > 9 5.46 osd.9 down 0 > >> > -3 54.6 host lkpsx0130 > >> > 10 5.46 osd.10 down 0 > >> > 11 5.46 osd.11 down 0 > >> > 12 5.46 osd.12 down 0 > >> > 13 5.46 osd.13 down 0 > >> > 14 5.46 osd.14 down 0 > >> > 15 5.46 osd.15 down 0 > >> > 16 5.46 osd.16 down 0 > >> > 17 5.46 osd.17 down 0 > >> > 18 5.46 osd.18 up 1 > >> > 19 5.46 osd.19 up 1 > >> > -4 54.6 host lkpsx0140 > >> > 20 5.46 osd.20 up 1 > >> > 21 5.46 osd.21 down 0 > >> > 22 5.46 osd.22 down 0 > >> > 23 5.46 osd.23 down 0 > >> > 24 5.46 osd.24 down 0 > >> > 25 5.46 osd.25 up 1 > >> > 26 5.46 osd.26 up 1 > >> > 27 5.46 osd.27 up 1 > >> > 28 5.46 osd.28 up 1 > >> > 29 5.46 osd.29 up 1 > >> > ———————————————— > >> > > >> > The monitor starts/restarts OK (only one monitor exists). > >> > But when starting one OSD with ceph -w nothing shows. > >> > > >> > Here is the ceph mon_status: > >> > > >> > ———————————————— > >> > { "name": "lkpsx0120", > >> > "rank": 0, > >> > "state": "leader", > >> > "election_epoch": 1, > >> > "quorum": [ > >> > 0], > >> > "outside_quorum": [], > >> > "extra_probe_peers": [], > >> > "sync_provider": [], > >> > "monmap": { "epoch": 4, > >> > "fsid": "9244194a-5e10-47ae-9287-507944612f95", > >> > "modified": "0.000000", > >> > "created": "0.000000", > >> > "mons": [ > >> > { "rank": 0, > >> > "name": "lkpsx0120", > >> > "addr": "10.15.2.120:6789\/0"}]}} > >> > ———————————————— > >> > > >> > Here is the ceph.conf file > >> > > >> > ———————————————— > >> > [global] > >> > fsid = 9244194a-5e10-47ae-9287-507944612f95 > >> > mon_initial_members = lkpsx0120 > >> > mon_host = 10.15.2.120 > >> > #debug osd = 20 > >> > #debug ms = 1 > >> > auth_cluster_required = cephx > >> > auth_service_required = cephx > >> > auth_client_required = cephx > >> > filestore_xattr_use_omap = true > >> > osd_crush_chooseleaf_type = 1 > >> > osd_pool_default_size = 2 > >> > public_network = 10.15.2.0/24 > >> > cluster_network = 10.15.4.0/24 > >> > rbd_cache = true > >> > rbd_cache_size = 67108864 > >> > rbd_cache_max_dirty = 50331648 > >> > rbd_cache_target_dirty = 33554432 > >> > rbd_cache_max_dirty_age = 2 > >> > rbd_cache_writethrough_until_flush = true > >> > ———————————————— > >> > > >> > Here is the decompiled crush map: > >> > > >> > ———————————————— > >> > # begin crush map > >> > tunable choose_local_tries 0 > >> > tunable choose_local_fallback_tries 0 > >> > tunable choose_total_tries 50 > >> > tunable chooseleaf_descend_once 1 > >> > > >> > # devices > >> > device 0 osd.0 > >> > device 1 osd.1 > >> > device 2 osd.2 > >> > device 3 osd.3 > >> > device 4 osd.4 > >> > device 5 osd.5 > >> > device 6 osd.6 > >> > device 7 osd.7 > >> > device 8 osd.8 > >> > device 9 osd.9 > >> > device 10 osd.10 > >> > device 11 osd.11 > >> > device 12 osd.12 > >> > device 13 osd.13 > >> > device 14 osd.14 > >> > device 15 osd.15 > >> > device 16 osd.16 > >> > device 17 osd.17 > >> > device 18 osd.18 > >> > device 19 osd.19 > >> > device 20 osd.20 > >> > device 21 osd.21 > >> > device 22 osd.22 > >> > device 23 osd.23 > >> > device 24 osd.24 > >> > device 25 osd.25 > >> > device 26 osd.26 > >> > device 27 osd.27 > >> > device 28 osd.28 > >> > device 29 osd.29 > >> > > >> > # types > >> > type 0 osd > >> > type 1 host > >> > type 2 chassis > >> > type 3 rack > >> > type 4 row > >> > type 5 pdu > >> > type 6 pod > >> > type 7 room > >> > type 8 datacenter > >> > type 9 region > >> > type 10 root > >> > > >> > # buckets > >> > host lkpsx0120 { > >> > id -2 # do not change unnecessarily > >> > # weight 54.600 > >> > alg straw > >> > hash 0 # rjenkins1 > >> > item osd.0 weight 5.460 > >> > item osd.1 weight 5.460 > >> > item osd.2 weight 5.460 > >> > item osd.3 weight 5.460 > >> > item osd.4 weight 5.460 > >> > item osd.5 weight 5.460 > >> > item osd.6 weight 5.460 > >> > item osd.7 weight 5.460 > >> > item osd.8 weight 5.460 > >> > item osd.9 weight 5.460 > >> > } > >> > host lkpsx0130 { > >> > id -3 # do not change unnecessarily > >> > # weight 54.600 > >> > alg straw > >> > hash 0 # rjenkins1 > >> > item osd.10 weight 5.460 > >> > item osd.11 weight 5.460 > >> > item osd.12 weight 5.460 > >> > item osd.13 weight 5.460 > >> > item osd.14 weight 5.460 > >> > item osd.15 weight 5.460 > >> > item osd.16 weight 5.460 > >> > item osd.17 weight 5.460 > >> > item osd.18 weight 5.460 > >> > item osd.19 weight 5.460 > >> > } > >> > host lkpsx0140 { > >> > id -4 # do not change unnecessarily > >> > # weight 54.600 > >> > alg straw > >> > hash 0 # rjenkins1 > >> > item osd.20 weight 5.460 > >> > item osd.21 weight 5.460 > >> > item osd.22 weight 5.460 > >> > item osd.23 weight 5.460 > >> > item osd.24 weight 5.460 > >> > item osd.25 weight 5.460 > >> > item osd.26 weight 5.460 > >> > item osd.27 weight 5.460 > >> > item osd.28 weight 5.460 > >> > item osd.29 weight 5.460 > >> > } > >> > root default { > >> > id -1 # do not change unnecessarily > >> > # weight 163.800 > >> > alg straw > >> > hash 0 # rjenkins1 > >> > item lkpsx0120 weight 54.600 > >> > item lkpsx0130 weight 54.600 > >> > item lkpsx0140 weight 54.600 > >> > } > >> > > >> > # rules > >> > rule replicated_ruleset { > >> > ruleset 0 > >> > type replicated > >> > min_size 1 > >> > max_size 10 > >> > step take default > >> > step chooseleaf firstn 0 type host > >> > step emit > >> > } > >> > > >> > # end crush map > >> > ———————————————— > >> > > >> > Operating system is Debian 8.0 and the CEPH version is 0.80.7 as stated in > >> > the crash log. > >> > > >> > We increased the log level and tried to start osd.1 as an example. All > >> > OSD:s we tried to start experiencing the same problem and dies. > >> > > >> > The log file from OSD 1 (ceph-osd.1.log) can be found here: > >> > https://www.dropbox.com/s/dqunlufh0qtked5/ceph-osd.1.log.zip?dl=0 > >> > > >> > As of now, all systems are down including the KVM-cluster that are > >> > dependent of CEPH. > >> > > >> > Best regards, > >> > Med vänlig hälsning > >> > > >> > Henrik > >> > ------------------------------ > >> > *Henrik Svensson* > >> > OpIT > >> > Sectra AB > >> > Teknikringen 20, 58330 Linköping, Sweden > >> > E-mail: henrik.svensson@xxxxxxxxxx > >> > Phone: +46 (0)13 352 884 > >> > Cellular: +46 (0)70 395141 > >> > Web: *www.sectra.com* <http://www.sectra.com/medical/> > >> > > >> > ------------------------------ > >> > This message is intended only for the addressee and may contain > >> > information that is > >> > confidential or privileged. Unauthorized use is strictly prohibited and > >> > may be unlawful. > >> > > >> > If you are not the addressee, you should not read, copy, disclose or > >> > otherwise use this > >> > message, except for the purpose of delivery to the addressee. If you have > >> > received > >> > this in error, please delete and advise us immediately. > >> > > >> > > >> > > >> > > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@xxxxxxxxxxxxxx > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > >> > > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com