ceph osd tree and crushmap output wrong

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,


I stay on the ceph-users list but haven't had any luck there with this.


Following is the pertinent parts of the thread. Sorry for all the text
but someones going to ask for it I'm sure....


------ Orig question

Sometimes my ceph osd tree output is wrong. Ie. Wrong osds on the
wrong hosts ?


Anyone else have this issue?


I have seen this at Infernalis and Jewell.


Thanks

Wade

-------- update on start false not set

[root@cpn00001 ~]# ceph daemon osd.0 config show | grep update | grep crush

[root@cpn00001 ~]# grep update /etc/ceph/ceph.conf

[root@cpn00001 ~]#

------- examples of output of tree command and crushmap

Sure.  Apologies for all the text: We have 12 Nodes for OSDs, 15 OSDs
per node,  but I will only include a sample:


ceph osd tree | head -35

ID  WEIGHT    TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY

 -1 130.98450 root default

 -2   5.82153     host cpn00001

  4   0.72769         osd.4          up  1.00000          1.00000

 14   0.72769         osd.14         up  1.00000          1.00000

  3   0.72769         osd.3          up  1.00000          1.00000

 24   0.72769         osd.24         up  1.00000          1.00000

  5   0.72769         osd.5          up  1.00000          1.00000

  2   0.72769         osd.2          up  1.00000          1.00000

 17   0.72769         osd.17         up  1.00000          1.00000

 69   0.72769         osd.69         up  1.00000          1.00000

 -3   6.54922     host cpn00003

  7   0.72769         osd.7          up  1.00000          1.00000

  8   0.72769         osd.8          up  1.00000          1.00000

  9   0.72769         osd.9          up  1.00000          1.00000

  0   0.72769         osd.0          up  1.00000          1.00000

 28   0.72769         osd.28         up  1.00000          1.00000

 10   0.72769         osd.10         up  1.00000          1.00000

  1   0.72769         osd.1          up  1.00000          1.00000

  6   0.72769         osd.6          up  1.00000          1.00000

 29   0.72769         osd.29         up  1.00000          1.00000

 -4   2.91077     host cpn00004

Compared with the actual processes that are running:


[root@cpx00001 ~]# ssh cpn00001 ps -ef | grep ceph\-osd

ceph       92638       1 26 16:19 ?        01:00:55 /usr/bin/ceph-osd
-f --cluster ceph --id 6 --setuser ceph --setgroup ceph

ceph       92667       1 20 16:19 ?        00:48:04 /usr/bin/ceph-osd
-f --cluster ceph --id 0 --setuser ceph --setgroup ceph

ceph       92673       1 18 16:19 ?        00:42:48 /usr/bin/ceph-osd
-f --cluster ceph --id 8 --setuser ceph --setgroup ceph

ceph       92681       1 19 16:19 ?        00:45:52 /usr/bin/ceph-osd
-f --cluster ceph --id 7 --setuser ceph --setgroup ceph

ceph       92701       1 15 16:19 ?        00:36:05 /usr/bin/ceph-osd
-f --cluster ceph --id 12 --setuser ceph --setgroup ceph

ceph       92748       1 14 16:19 ?        00:34:07 /usr/bin/ceph-osd
-f --cluster ceph --id 10 --setuser ceph --setgroup ceph

ceph       92756       1 16 16:19 ?        00:38:40 /usr/bin/ceph-osd
-f --cluster ceph --id 9 --setuser ceph --setgroup ceph

ceph       92758       1 17 16:19 ?        00:39:28 /usr/bin/ceph-osd
-f --cluster ceph --id 13 --setuser ceph --setgroup ceph

ceph       92777       1 19 16:19 ?        00:46:17 /usr/bin/ceph-osd
-f --cluster ceph --id 1 --setuser ceph --setgroup ceph

ceph       92988       1 18 16:19 ?        00:42:47 /usr/bin/ceph-osd
-f --cluster ceph --id 5 --setuser ceph --setgroup ceph

ceph       93058       1 18 16:19 ?        00:43:18 /usr/bin/ceph-osd
-f --cluster ceph --id 11 --setuser ceph --setgroup ceph

ceph       93078       1 17 16:19 ?        00:41:38 /usr/bin/ceph-osd
-f --cluster ceph --id 14 --setuser ceph --setgroup ceph

ceph       93127       1 15 16:19 ?        00:36:29 /usr/bin/ceph-osd
-f --cluster ceph --id 4 --setuser ceph --setgroup ceph

ceph       93130       1 17 16:19 ?        00:40:44 /usr/bin/ceph-osd
-f --cluster ceph --id 2 --setuser ceph --setgroup ceph

ceph       93173       1 21 16:19 ?        00:49:37 /usr/bin/ceph-osd
-f --cluster ceph --id 3 --setuser ceph --setgroup ceph

[root@cpx00001 ~]# ssh cpn00003 ps -ef | grep ceph\-osd

ceph       82454       1 18 16:19 ?        00:43:58 /usr/bin/ceph-osd
-f --cluster ceph --id 25 --setuser ceph --setgroup ceph

ceph       82464       1 24 16:19 ?        00:55:40 /usr/bin/ceph-osd
-f --cluster ceph --id 21 --setuser ceph --setgroup ceph

ceph       82473       1 21 16:19 ?        00:50:14 /usr/bin/ceph-osd
-f --cluster ceph --id 17 --setuser ceph --setgroup ceph

ceph       82612       1 19 16:19 ?        00:45:25 /usr/bin/ceph-osd
-f --cluster ceph --id 22 --setuser ceph --setgroup ceph

ceph       82629       1 20 16:19 ?        00:48:38 /usr/bin/ceph-osd
-f --cluster ceph --id 16 --setuser ceph --setgroup ceph

ceph       82651       1 16 16:19 ?        00:39:24 /usr/bin/ceph-osd
-f --cluster ceph --id 20 --setuser ceph --setgroup ceph

ceph       82687       1 17 16:19 ?        00:40:31 /usr/bin/ceph-osd
-f --cluster ceph --id 18 --setuser ceph --setgroup ceph

ceph       82697       1 26 16:19 ?        01:02:12 /usr/bin/ceph-osd
-f --cluster ceph --id 23 --setuser ceph --setgroup ceph

ceph       82719       1 20 16:19 ?        00:47:15 /usr/bin/ceph-osd
-f --cluster ceph --id 15 --setuser ceph --setgroup ceph

ceph       82722       1 14 16:19 ?        00:33:41 /usr/bin/ceph-osd
-f --cluster ceph --id 28 --setuser ceph --setgroup ceph

ceph       82725       1 14 16:19 ?        00:33:16 /usr/bin/ceph-osd
-f --cluster ceph --id 26 --setuser ceph --setgroup ceph

ceph       82743       1 14 16:19 ?        00:34:17 /usr/bin/ceph-osd
-f --cluster ceph --id 29 --setuser ceph --setgroup ceph

ceph       82769       1 19 16:19 ?        00:46:00 /usr/bin/ceph-osd
-f --cluster ceph --id 19 --setuser ceph --setgroup ceph

ceph       82816       1 13 16:19 ?        00:30:26 /usr/bin/ceph-osd
-f --cluster ceph --id 27 --setuser ceph --setgroup ceph

ceph       82828       1 27 16:19 ?        01:04:38 /usr/bin/ceph-osd
-f --cluster ceph --id 24 --setuser ceph --setgroup ceph


[root@cpx00001 ~]#

Looks like the crushmap is bad also:

(Cluster appears to be operating ok but this really concerns me.)

# begin crush map

tunable choose_local_tries 0

tunable choose_local_fallback_tries 0

tunable choose_total_tries 50

tunable chooseleaf_descend_once 1

tunable straw_calc_version 1


# devices

device 0 osd.0

device 1 osd.1

device 2 osd.2

device 3 osd.3

device 4 osd.4

device 5 osd.5

device 6 osd.6

device 7 osd.7

device 8 osd.8

device 9 osd.9

device 10 osd.10

device 11 osd.11

device 12 osd.12

device 13 osd.13

device 14 osd.14

device 15 osd.15

device 16 osd.16

device 17 osd.17

device 18 osd.18

device 19 osd.19

device 20 osd.20

device 21 osd.21

device 22 osd.22

device 23 osd.23

device 24 osd.24

device 25 osd.25

device 26 osd.26

device 27 osd.27

device 28 osd.28

device 29 osd.29

device 30 osd.30

device 31 osd.31

device 32 osd.32

device 33 osd.33

device 34 osd.34

device 35 osd.35

device 36 osd.36


device 37 osd.37

...


# types

type 0 osd

type 1 host

type 2 chassis

type 3 rack

type 4 row

type 5 pdu

type 6 pod

type 7 room

type 8 datacenter

type 9 region

type 10 root


# buckets

host cpn00001 {

        id -2           # do not change unnecessarily

        # weight 5.822

        alg straw

        hash 0  # rjenkins1

        item osd.4 weight 0.728

        item osd.14 weight 0.728

        item osd.3 weight 0.728

        item osd.24 weight 0.728

        item osd.5 weight 0.728

        item osd.2 weight 0.728

        item osd.17 weight 0.728

        item osd.69 weight 0.728

}

host cpn00003 {

        id -3           # do not change unnecessarily

        # weight 6.549

        alg straw

        hash 0  # rjenkins1

        item osd.7 weight 0.728

        item osd.8 weight 0.728

        item osd.9 weight 0.728

        item osd.0 weight 0.728

        item osd.28 weight 0.728

        item osd.10 weight 0.728

        item osd.1 weight 0.728

        item osd.6 weight 0.728

        item osd.29 weight 0.728

}


host cpn00004 {....



Thank you for your review !!!!!


-----

Any help would be greatly appreciated as this has me very concerned.
Otherwise the cluster appears very healthy and running well.


Thanks
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux