Hi, I try to test ceph 9.2 cluster. My lab have 1 mon and 2 osd with 4 disks each. Only 1 osd server (with 4 disks) are online. The disks of second osd don't go up ... Some info about environment: [ceph@OSD1 ~]$ sudo ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 8.00000 root default -4 8.00000 datacenter dc1 -5 8.00000 room room1 -6 8.00000 row row1 -7 4.00000 rack rack1 -2 4.00000 host OSD1 0 1.00000 osd.0 up 1.00000 1.00000 1 1.00000 osd.1 up 1.00000 1.00000 2 1.00000 osd.2 up 1.00000 1.00000 3 1.00000 osd.3 up 1.00000 1.00000 -8 4.00000 rack rack2 -3 4.00000 host OSD2 4 1.00000 osd.4 down 1.00000 1.00000 5 1.00000 osd.5 down 1.00000 1.00000 6 1.00000 osd.6 down 1.00000 1.00000 7 1.00000 osd.7 down 1.00000 1.00000 [ceph@OSD1 ceph-deploy]$ sudo ceph osd dump epoch 411 fsid d17520de-0d1e-495b-90dc-f7044f7f165f created 2015-11-14 06:56:36.017672 modified 2015-12-08 09:48:47.685050 flags nodown pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 53 flags hashpspool min_write_recency_for_promote 1 stripe_width 0 max_osd 9 osd.0 up in weight 1 up_from 394 up_thru 104 down_at 393 last_clean_interval [388,392) 192.168.64.129:6800/4599 192.168.62.129:6800/4599 192.168.62.129:6801/4599 192.168.64.129:6801/4599 exists,up 499a3624-b2ba-455d-b35a-31d628e1a353 osd.1 up in weight 1 up_from 396 up_thru 136 down_at 395 last_clean_interval [390,392) 192.168.64.129:6802/4718 192.168.62.129:6802/4718 192.168.62.129:6803/4718 192.168.64.129:6803/4718 exists,up d7933117-0056-4c3c-ac63-2ad300495e3f osd.2 up in weight 1 up_from 400 up_thru 136 down_at 399 last_clean_interval [392,392) 192.168.64.129:6806/5109 192.168.62.129:6806/5109 192.168.62.129:6807/5109 192.168.64.129:6807/5109 exists,up 7d820897-8d49-4142-8c58-feda8bb04749 osd.3 up in weight 1 up_from 398 up_thru 136 down_at 397 last_clean_interval [386,392) 192.168.64.129:6804/4963 192.168.62.129:6804/4963 192.168.62.129:6805/4963 192.168.64.129:6805/4963 exists,up 96270d9d-ed95-40be-9ae4-7bf66aedd4d8 osd.4 down out weight 0 up_from 34 up_thru 53 down_at 58 last_clean_interval [0,0) 192.168.64.130:6800/3615 192.168.64.130:6801/3615 192.168.64.130:6802/3615 192.168.64.130:6803/3615 autoout,exists 6364d590-62fb-4348-b8fe-19b59cd2ceb3 osd.5 down out weight 0 up_from 145 up_thru 151 down_at 203 last_clean_interval [39,54) 192.168.64.130:6800/2784 192.168.62.130:6800/2784 192.168.62.130:6801/2784 192.168.64.130:6801/2784 autoout,exists aa51cdcc-ca9c-436b-b9fc-7bddaef3226d osd.6 down out weight 0 up_from 44 up_thru 53 down_at 58 last_clean_interval [0,0) 192.168.64.130:6808/4975 192.168.64.130:6809/4975 192.168.64.130:6810/4975 192.168.64.130:6811/4975 autoout,exists 36672496-3346-446a-a617-94c8596e1da2 osd.7 down out weight 0 up_from 155 up_thru 161 down_at 204 last_clean_interval [49,54) 192.168.64.130:6800/2434 192.168.62.130:6800/2434 192.168.62.130:6801/2434 192.168.64.130:6801/2434 autoout,exists 775065fa-8fa8-48ce-a4cc-b034a720fe93 All UUID are correct (the down is appear after upgrade). Now I'm not able to create some osd with ceph-deploy. I have cancel all osd disk from cluster and deploy from zero. Now I have some problem for service osd when start with system but now service appear running. [ceph@OSD2 ~]$ sudo systemctl start ceph-osd@4.service [ceph@OSD2 ~]$ sudo systemctl status ceph-osd@4.service -l ceph-osd@4.service - Ceph object storage daemon Loaded: loaded (/etc/systemd/system/ceph.target.wants/ceph-osd@4.service) Active: active (running) since Tue 2015-12-08 10:31:38 PST; 9s ago Process: 6542 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) Main PID: 6599 (ceph-osd) CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@4.service ââ6599 /usr/bin/ceph-osd --cluster ceph --id 4 --setuser ceph --setgroup ceph -f Dec 08 10:31:38 OSD2.local ceph-osd-prestart.sh[6542]: create-or-move updated item name 'osd.4' weight 0.0098 at location {host=OSD2,root=default} to crush map Dec 08 10:31:38 OSD2.local systemd[1]: Started Ceph object storage daemon. Dec 08 10:31:38 OSD2.local ceph-osd[6599]: starting osd.4 at :/0 osd_data /var/lib/ceph/osd/ceph-4 /dev/sdb1 Dec 08 10:31:38 OSD2.local ceph-osd[6599]: 2015-12-08 10:31:38.702018 7f0a555fc900 -1 osd.4 411 log_to_monitors {default=true} [ceph@OSD2 ~]$ [ceph@OSD2 ~]$ sudo ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 4.03918 root default -4 4.03918 datacenter dc1 -5 4.03918 room room1 -6 4.03918 row row1 -7 4.00000 rack rack1 -2 4.00000 host OSD1 0 1.00000 osd.0 up 1.00000 1.00000 1 1.00000 osd.1 up 1.00000 1.00000 2 1.00000 osd.2 up 1.00000 1.00000 3 1.00000 osd.3 up 1.00000 1.00000 -8 0.03918 rack rack2 -3 0.03918 host OSD2 4 0.00980 osd.4 down 0 1.00000 5 0.00980 osd.5 down 0 1.00000 6 0.00980 osd.6 down 0 1.00000 7 0.00980 osd.7 down 0 1.00000 Script for start osd is: [root@OSD2 ceph-4]# cat /etc/systemd/system/ceph.target.wants/ceph-osd\@4.service [Unit] Description=Ceph object storage daemon After=network-online.target local-fs.target Wants=network-online.target local-fs.target PartOf=ceph.target [Service] LimitNOFILE=1048576 LimitNPROC=1048576 EnvironmentFile=-/etc/sysconfig/ceph Environment=CLUSTER=ceph ExecStart=/usr/bin/ceph-osd --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph -f ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i ExecReload=/bin/kill -HUP $MAINPID [Install] WantedBy=ceph.target I see all disk of OSD2 always down …someone call help me in troubleshooting? Some idea??? I'm crazy for this situation!! Thanks. Andrea. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com