Re: CEPH All OSD got segmentation fault after CRUSH edit

Samuel Just <sjust@xxxxxxxxxx> · Tue, 26 Apr 2016 10:39:33 -0700



I think?  Probably worth reproducing on a vstart cluster to validate
the fix.  Didn't we introduce something in the mon to validate new
crushmaps?  Hammer maybe?
-Sam

On Tue, Apr 26, 2016 at 8:09 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>
>> Op 26 april 2016 om 16:58 schreef Samuel Just <sjust@xxxxxxxxxx>:
>>
>>
>> Can you attach the OSDMap (ceph osd getmap -o <mapfile>)?
>> -Sam
>>
>
> Henrik contacted me to look at this and this is what I found:
>
> 0x0000000000b18b81 in crush_choose_firstn (map=map@entry=0x1f00200, bucket=0x0, weight=weight@entry=0x1f2b880, weight_max=weight_max@entry=30, x=x@entry=1731224833, numrep=2, type=1, out=0x7fffdc036508, outpos=0, out_size=2, tries=51, recurse_tries=1, local_retries=0,
>     local_fallback_retries=0, recurse_to_leaf=1, vary_r=0, out2=0x7fffdc036510, parent_r=0) at crush/mapper.c:345
> 345     crush/mapper.c: No such file or directory.
>
> A bit more output from GDB:
>
> #0  0x0000000000b18b81 in crush_choose_firstn (map=map@entry=0x1f00200, bucket=0x0, weight=weight@entry=0x1f2b880, weight_max=weight_max@entry=30, x=x@entry=1731224833, numrep=2, type=1, out=0x7fffdc036508, outpos=0, out_size=2, tries=51, recurse_tries=1, local_retries=0,
>     local_fallback_retries=0, recurse_to_leaf=1, vary_r=0, out2=0x7fffdc036510, parent_r=0) at crush/mapper.c:345
> #1  0x0000000000b194cb in crush_do_rule (map=0x1f00200, ruleno=<optimized out>, x=1731224833, result=0x7fffdc036520, result_max=<optimized out>, weight=0x1f2b880, weight_max=30, scratch=<optimized out>) at crush/mapper.c:794
> #2  0x0000000000a61680 in do_rule (weight=std::vector of length 30, capacity 30 = {...}, maxout=2, out=std::vector of length 0, capacity 0, x=1731224833, rule=0, this=0x1f72340) at ./crush/CrushWrapper.h:939
> #3  OSDMap::_pg_to_osds (this=this@entry=0x1f46800, pool=..., pg=..., osds=osds@entry=0x7fffdc036600, primary=primary@entry=0x7fffdc0365ec, ppps=0x7fffdc0365f4) at osd/OSDMap.cc:1417
>
> It seems that CRUSH can't find entries in the CRUSHMap. In this case the 'root default' was removed while the default ruleset still refers to it.
>
> The cluster is running 0.80.11
>
> I extracted the CRUSHMaps from the OSDMaps on osd.0:
>
> $ for i in {1392..1450}; do find -name "osdmap*${i}*" -exec osdmaptool --export-crush /tmp/crush.${i} {} \;; crushtool -d /tmp/crush.${i} -o /tmp/crush.${i}.txt; done
>
> Here I see that in map 1433 the root 'default' doesn't exist, but the crush ruleset refers to 'bucket0'. This crushmap is attached.
>
> rule replicated_ruleset {
>         ruleset 0
>         type replicated
>         min_size 1
>         max_size 10
>         step take bucket0
>         step chooseleaf firstn 0 type host
>         step emit
> }
>
> The root bucket0 doesn't exist.
>
> bucket0 seems like something which was created by Ceph/CRUSH and not by the user.
>
> I'm thinking about injecting a fixed CRUSHMap into this OSDMap where bucket0 does exist. Does that seem like a sane thing to do?
>
> Wido
>
>
>> On Tue, Apr 26, 2016 at 2:07 AM, Henrik Svensson <henrik.svensson@xxxxxxxxxx
>> > wrote:
>>
>> > Hi!
>> >
>> > We got a three node CEPH cluster with 10 OSD each.
>> >
>> > We bought 3 new machines with additional 30 disks that should reside in
>> > another location.
>> > Before adding these machines we modified the default CRUSH table.
>> >
>> > After modifying the (default) crush table with these commands the cluster
>> > went down:
>> >
>> > ————————————————
>> > ceph osd crush add-bucket dc1 datacenter
>> > ceph osd crush add-bucket dc2 datacenter
>> > ceph osd crush add-bucket availo datacenter
>> > ceph osd crush move dc1 root=default
>> > ceph osd crush move lkpsx0120 root=default datacenter=dc1
>> > ceph osd crush move lkpsx0130 root=default datacenter=dc1
>> > ceph osd crush move lkpsx0140 root=default datacenter=dc1
>> > ceph osd crush move dc2 root=default
>> > ceph osd crush move availo root=default
>> > ceph osd crush add-bucket sectra root
>> > ceph osd crush move dc1 root=sectra
>> > ceph osd crush move dc2 root=sectra
>> > ceph osd crush move dc3 root=sectra
>> > ceph osd crush move availo root=sectra
>> > ceph osd crush remove default
>> > ————————————————
>> >
>> > I tried to revert the CRUSH map but no luck:
>> >
>> > ————————————————
>> > ceph osd crush add-bucket default root
>> > ceph osd crush move lkpsx0120 root=default
>> > ceph osd crush move lkpsx0130 root=default
>> > ceph osd crush move lkpsx0140 root=default
>> > ceph osd crush remove sectra
>> > ————————————————
>> >
>> > After trying to restart the cluster (and even the machines) no OSD started
>> > up again.
>> > But ceph osd tree gave this output, stating certain OSD:s are up (but the
>> > processes are not running):
>> >
>> > ————————————————
>> > # id weight type name up/down reweight
>> > -1 163.8 root default
>> > -2 54.6 host lkpsx0120
>> > 0 5.46 osd.0 down 0
>> > 1 5.46 osd.1 down 0
>> > 2 5.46 osd.2 down 0
>> > 3 5.46 osd.3 down 0
>> > 4 5.46 osd.4 down 0
>> > 5 5.46 osd.5 down 0
>> > 6 5.46 osd.6 down 0
>> > 7 5.46 osd.7 down 0
>> > 8 5.46 osd.8 down 0
>> > 9 5.46 osd.9 down 0
>> > -3 54.6 host lkpsx0130
>> > 10 5.46 osd.10 down 0
>> > 11 5.46 osd.11 down 0
>> > 12 5.46 osd.12 down 0
>> > 13 5.46 osd.13 down 0
>> > 14 5.46 osd.14 down 0
>> > 15 5.46 osd.15 down 0
>> > 16 5.46 osd.16 down 0
>> > 17 5.46 osd.17 down 0
>> > 18 5.46 osd.18 up 1
>> > 19 5.46 osd.19 up 1
>> > -4 54.6 host lkpsx0140
>> > 20 5.46 osd.20 up 1
>> > 21 5.46 osd.21 down 0
>> > 22 5.46 osd.22 down 0
>> > 23 5.46 osd.23 down 0
>> > 24 5.46 osd.24 down 0
>> > 25 5.46 osd.25 up 1
>> > 26 5.46 osd.26 up 1
>> > 27 5.46 osd.27 up 1
>> > 28 5.46 osd.28 up 1
>> > 29 5.46 osd.29 up 1
>> > ————————————————
>> >
>> > The monitor starts/restarts OK (only one monitor exists).
>> > But when starting one OSD with ceph -w nothing shows.
>> >
>> > Here is the ceph mon_status:
>> >
>> > ————————————————
>> > { "name": "lkpsx0120",
>> >   "rank": 0,
>> >   "state": "leader",
>> >   "election_epoch": 1,
>> >   "quorum": [
>> >         0],
>> >   "outside_quorum": [],
>> >   "extra_probe_peers": [],
>> >   "sync_provider": [],
>> >   "monmap": { "epoch": 4,
>> >       "fsid": "9244194a-5e10-47ae-9287-507944612f95",
>> >       "modified": "0.000000",
>> >       "created": "0.000000",
>> >       "mons": [
>> >             { "rank": 0,
>> >               "name": "lkpsx0120",
>> >               "addr": "10.15.2.120:6789\/0"}]}}
>> > ————————————————
>> >
>> > Here is the ceph.conf file
>> >
>> > ————————————————
>> > [global]
>> > fsid = 9244194a-5e10-47ae-9287-507944612f95
>> > mon_initial_members = lkpsx0120
>> > mon_host = 10.15.2.120
>> > #debug osd = 20
>> > #debug ms = 1
>> > auth_cluster_required = cephx
>> > auth_service_required = cephx
>> > auth_client_required = cephx
>> > filestore_xattr_use_omap = true
>> > osd_crush_chooseleaf_type = 1
>> > osd_pool_default_size = 2
>> > public_network = 10.15.2.0/24
>> > cluster_network = 10.15.4.0/24
>> > rbd_cache = true
>> > rbd_cache_size = 67108864
>> > rbd_cache_max_dirty = 50331648
>> > rbd_cache_target_dirty = 33554432
>> > rbd_cache_max_dirty_age = 2
>> > rbd_cache_writethrough_until_flush = true
>> > ————————————————
>> >
>> > Here is the decompiled crush map:
>> >
>> > ————————————————
>> > # begin crush map
>> > tunable choose_local_tries 0
>> > tunable choose_local_fallback_tries 0
>> > tunable choose_total_tries 50
>> > tunable chooseleaf_descend_once 1
>> >
>> > # devices
>> > device 0 osd.0
>> > device 1 osd.1
>> > device 2 osd.2
>> > device 3 osd.3
>> > device 4 osd.4
>> > device 5 osd.5
>> > device 6 osd.6
>> > device 7 osd.7
>> > device 8 osd.8
>> > device 9 osd.9
>> > device 10 osd.10
>> > device 11 osd.11
>> > device 12 osd.12
>> > device 13 osd.13
>> > device 14 osd.14
>> > device 15 osd.15
>> > device 16 osd.16
>> > device 17 osd.17
>> > device 18 osd.18
>> > device 19 osd.19
>> > device 20 osd.20
>> > device 21 osd.21
>> > device 22 osd.22
>> > device 23 osd.23
>> > device 24 osd.24
>> > device 25 osd.25
>> > device 26 osd.26
>> > device 27 osd.27
>> > device 28 osd.28
>> > device 29 osd.29
>> >
>> > # types
>> > type 0 osd
>> > type 1 host
>> > type 2 chassis
>> > type 3 rack
>> > type 4 row
>> > type 5 pdu
>> > type 6 pod
>> > type 7 room
>> > type 8 datacenter
>> > type 9 region
>> > type 10 root
>> >
>> > # buckets
>> > host lkpsx0120 {
>> > id -2 # do not change unnecessarily
>> > # weight 54.600
>> > alg straw
>> > hash 0 # rjenkins1
>> > item osd.0 weight 5.460
>> > item osd.1 weight 5.460
>> > item osd.2 weight 5.460
>> > item osd.3 weight 5.460
>> > item osd.4 weight 5.460
>> > item osd.5 weight 5.460
>> > item osd.6 weight 5.460
>> > item osd.7 weight 5.460
>> > item osd.8 weight 5.460
>> > item osd.9 weight 5.460
>> > }
>> > host lkpsx0130 {
>> > id -3 # do not change unnecessarily
>> > # weight 54.600
>> > alg straw
>> > hash 0 # rjenkins1
>> > item osd.10 weight 5.460
>> > item osd.11 weight 5.460
>> > item osd.12 weight 5.460
>> > item osd.13 weight 5.460
>> > item osd.14 weight 5.460
>> > item osd.15 weight 5.460
>> > item osd.16 weight 5.460
>> > item osd.17 weight 5.460
>> > item osd.18 weight 5.460
>> > item osd.19 weight 5.460
>> > }
>> > host lkpsx0140 {
>> > id -4 # do not change unnecessarily
>> > # weight 54.600
>> > alg straw
>> > hash 0 # rjenkins1
>> > item osd.20 weight 5.460
>> > item osd.21 weight 5.460
>> > item osd.22 weight 5.460
>> > item osd.23 weight 5.460
>> > item osd.24 weight 5.460
>> > item osd.25 weight 5.460
>> > item osd.26 weight 5.460
>> > item osd.27 weight 5.460
>> > item osd.28 weight 5.460
>> > item osd.29 weight 5.460
>> > }
>> > root default {
>> > id -1 # do not change unnecessarily
>> > # weight 163.800
>> > alg straw
>> > hash 0 # rjenkins1
>> > item lkpsx0120 weight 54.600
>> > item lkpsx0130 weight 54.600
>> > item lkpsx0140 weight 54.600
>> > }
>> >
>> > # rules
>> > rule replicated_ruleset {
>> > ruleset 0
>> > type replicated
>> > min_size 1
>> > max_size 10
>> > step take default
>> > step chooseleaf firstn 0 type host
>> > step emit
>> > }
>> >
>> > # end crush map
>> > ————————————————
>> >
>> > Operating system is Debian 8.0 and the CEPH version is 0.80.7 as stated in
>> > the crash log.
>> >
>> > We increased the log level and tried to start osd.1 as an example. All
>> > OSD:s we tried to start experiencing the same problem and dies.
>> >
>> > The log file from OSD 1 (ceph-osd.1.log) can be found here:
>> > https://www.dropbox.com/s/dqunlufh0qtked5/ceph-osd.1.log.zip?dl=0
>> >
>> > As of now, all systems are down including the KVM-cluster that are
>> > dependent of CEPH.
>> >
>> > Best regards,
>> > Med vänlig hälsning
>> >
>> > Henrik
>> > ------------------------------
>> > *Henrik Svensson*
>> > OpIT
>> > Sectra AB
>> > Teknikringen 20, 58330 Linköping, Sweden
>> > E-mail: henrik.svensson@xxxxxxxxxx
>> > Phone: +46 (0)13 352 884
>> > Cellular: +46 (0)70 395141
>> > Web: *www.sectra.com* <http://www.sectra.com/medical/>
>> >
>> > ------------------------------
>> > This message is intended only for the addressee and may contain
>> > information that is
>> > confidential or privileged. Unauthorized use is strictly prohibited and
>> > may be unlawful.
>> >
>> > If you are not the addressee, you should not read, copy, disclose or
>> > otherwise use this
>> > message, except for the purpose of delivery to the addressee. If you have
>> > received
>> > this in error, please delete and advise us immediately.
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com