[CEPH-LIST]: problem with osd to view up

Andrea Annoè <Andrea.Annoe@xxxxxx> · Thu, 10 Dec 2015 14:14:25 +0100

Hi,
I try to test ceph 9.2 cluster.

My lab have 1 mon and 2 osd with 4 disks each.

Only 1 osd server (with 4 disks) are online.
The disks of second osd don't go up ...

Some info about environment:
[ceph@OSD1 ~]$ sudo ceph osd tree
ID  WEIGHT  TYPE NAME                     UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 8.00000 root default
-4 8.00000     datacenter dc1
-5 8.00000         room room1
-6 8.00000             row row1
-7 4.00000                 rack rack1
-2 4.00000                     host OSD1
  0 1.00000                         osd.0      up  1.00000          1.00000
  1 1.00000                         osd.1      up  1.00000          1.00000
  2 1.00000                         osd.2      up  1.00000          1.00000
  3 1.00000                         osd.3      up  1.00000          1.00000
-8 4.00000                 rack rack2
-3 4.00000                     host OSD2
  4 1.00000                         osd.4    down  1.00000          1.00000
  5 1.00000                         osd.5    down  1.00000          1.00000
  6 1.00000                         osd.6    down  1.00000          1.00000
  7 1.00000                         osd.7    down  1.00000          1.00000

[ceph@OSD1 ceph-deploy]$ sudo ceph osd dump epoch 411 fsid d17520de-0d1e-495b-90dc-f7044f7f165f
created 2015-11-14 06:56:36.017672
modified 2015-12-08 09:48:47.685050
flags nodown
pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 53 flags hashpspool min_write_recency_for_promote 1 stripe_width 0 max_osd 9
osd.0 up   in  weight 1 up_from 394 up_thru 104 down_at 393 last_clean_interval [388,392) 192.168.64.129:6800/4599 192.168.62.129:6800/4599 192.168.62.129:6801/4599 192.168.64.129:6801/4599 exists,up 499a3624-b2ba-455d-b35a-31d628e1a353
osd.1 up   in  weight 1 up_from 396 up_thru 136 down_at 395 last_clean_interval [390,392) 192.168.64.129:6802/4718 192.168.62.129:6802/4718 192.168.62.129:6803/4718 192.168.64.129:6803/4718 exists,up d7933117-0056-4c3c-ac63-2ad300495e3f
osd.2 up   in  weight 1 up_from 400 up_thru 136 down_at 399 last_clean_interval [392,392) 192.168.64.129:6806/5109 192.168.62.129:6806/5109 192.168.62.129:6807/5109 192.168.64.129:6807/5109 exists,up 7d820897-8d49-4142-8c58-feda8bb04749
osd.3 up   in  weight 1 up_from 398 up_thru 136 down_at 397 last_clean_interval [386,392) 192.168.64.129:6804/4963 192.168.62.129:6804/4963 192.168.62.129:6805/4963 192.168.64.129:6805/4963 exists,up 96270d9d-ed95-40be-9ae4-7bf66aedd4d8
osd.4 down out weight 0 up_from 34 up_thru 53 down_at 58 last_clean_interval [0,0) 192.168.64.130:6800/3615 192.168.64.130:6801/3615 192.168.64.130:6802/3615 192.168.64.130:6803/3615 autoout,exists 6364d590-62fb-4348-b8fe-19b59cd2ceb3
osd.5 down out weight 0 up_from 145 up_thru 151 down_at 203 last_clean_interval [39,54) 192.168.64.130:6800/2784 192.168.62.130:6800/2784 192.168.62.130:6801/2784 192.168.64.130:6801/2784 autoout,exists aa51cdcc-ca9c-436b-b9fc-7bddaef3226d
osd.6 down out weight 0 up_from 44 up_thru 53 down_at 58 last_clean_interval [0,0) 192.168.64.130:6808/4975 192.168.64.130:6809/4975 192.168.64.130:6810/4975 192.168.64.130:6811/4975 autoout,exists 36672496-3346-446a-a617-94c8596e1da2
osd.7 down out weight 0 up_from 155 up_thru 161 down_at 204 last_clean_interval [49,54) 192.168.64.130:6800/2434 192.168.62.130:6800/2434 192.168.62.130:6801/2434 192.168.64.130:6801/2434 autoout,exists 775065fa-8fa8-48ce-a4cc-b034a720fe93

All UUID are correct (the down is appear after upgrade).
Now I'm not able to create some osd with ceph-deploy.
I have cancel all osd disk from cluster and deploy from zero.
Now I have some problem for service osd when start with system but now service appear running.

[ceph@OSD2 ~]$ sudo systemctl start ceph-osd@4.service
[ceph@OSD2 ~]$ sudo systemctl status ceph-osd@4.service -l ceph-osd@4.service - Ceph object storage daemon
   Loaded: loaded (/etc/systemd/system/ceph.target.wants/ceph-osd@4.service)
   Active: active (running) since Tue 2015-12-08 10:31:38 PST; 9s ago
  Process: 6542 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) Main PID: 6599 (ceph-osd)
   CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@4.service
           ââ6599 /usr/bin/ceph-osd --cluster ceph --id 4 --setuser ceph --setgroup ceph -f

Dec 08 10:31:38 OSD2.local ceph-osd-prestart.sh[6542]: create-or-move updated item name 'osd.4' weight 0.0098 at location {host=OSD2,root=default} to crush map Dec 08 10:31:38 OSD2.local systemd[1]: Started Ceph object storage daemon.
Dec 08 10:31:38 OSD2.local ceph-osd[6599]: starting osd.4 at :/0 osd_data /var/lib/ceph/osd/ceph-4 /dev/sdb1 Dec 08 10:31:38 OSD2.local ceph-osd[6599]: 2015-12-08 10:31:38.702018 7f0a555fc900 -1 osd.4 411 log_to_monitors {default=true}
[ceph@OSD2 ~]$
[ceph@OSD2 ~]$ sudo ceph osd tree
ID  WEIGHT  TYPE NAME                     UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 4.03918 root default
-4 4.03918     datacenter dc1
-5 4.03918         room room1
-6 4.03918             row row1
-7 4.00000                 rack rack1
-2 4.00000                     host OSD1
  0 1.00000                         osd.0      up  1.00000          1.00000
  1 1.00000                         osd.1      up  1.00000          1.00000
  2 1.00000                         osd.2      up  1.00000          1.00000
  3 1.00000                         osd.3      up  1.00000          1.00000
-8 0.03918                 rack rack2
-3 0.03918                     host OSD2
  4 0.00980                         osd.4    down        0          1.00000
  5 0.00980                         osd.5    down        0          1.00000
  6 0.00980                         osd.6    down        0          1.00000
  7 0.00980                         osd.7    down        0          1.00000

Script for start osd is:
[root@OSD2 ceph-4]# cat /etc/systemd/system/ceph.target.wants/ceph-osd\@4.service
[Unit]
Description=Ceph object storage daemon
After=network-online.target local-fs.target Wants=network-online.target local-fs.target PartOf=ceph.target

[Service]
LimitNOFILE=1048576
LimitNPROC=1048576
EnvironmentFile=-/etc/sysconfig/ceph
Environment=CLUSTER=ceph
ExecStart=/usr/bin/ceph-osd --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph -f ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=ceph.target

I see all disk of OSD2 always down …someone call help me in troubleshooting?
Some idea???
I'm crazy for this situation!!

Thanks.
Andrea.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com