Centos 7.2.
.. and i think i just figured it out. One node had directories from former OSDs in /var/lib/ceph/osd. When restarting other OSDs on this host, ceph apparently added those to the crush map, too.
[root@sm-cld-mtl-013 osd]# ls -la /var/lib/ceph/osd/
total 128
drwxr-x--- 8 ceph ceph 90 Feb 24 14:44 .
drwxr-x--- 9 ceph ceph 106 Feb 24 14:44 ..
drwxr-xr-x 2 root root 6 Jul 2 2015 ceph-42
drwxr-xr-x 2 root root 6 Jul 2 2015 ceph-43
drwxr-xr-x 1 root root 278 May 4 22:21 ceph-44
drwxr-xr-x 1 root root 278 May 4 22:21 ceph-45
drwxr-xr-x 1 root root 278 May 4 22:25 ceph-67
drwxr-xr-x 1 root root 304 May 4 22:25 ceph-86
(42 and 43 are on a different host.. yet when 'systemctl start ceph.target' is used, the osd preflight adds them to the crush map anyway:
May 4 22:13:26 sm-cld-mtl-013 ceph-osd: starting osd.67 at :/0 osd_data /var/lib/ceph/osd/ceph-67 /var/lib/ceph/osd/ceph-67/journal
May 4 22:13:26 sm-cld-mtl-013 ceph-osd: starting osd.45 at :/0 osd_data /var/lib/ceph/osd/ceph-45 /var/lib/ceph/osd/ceph-45/journal
May 4 22:13:26 sm-cld-mtl-013 ceph-osd: WARNING: will not setuid/gid: /var/lib/ceph/osd/ceph-42 owned by 0:0 and not requested 167:167
May 4 22:13:26 sm-cld-mtl-013 ceph-osd: 2016-05-04 22:13:26.529176 7f00cca7c900 -1 #033[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-43: (2) No such file or directory#033[0m
May 4 22:13:26 sm-cld-mtl-013 ceph-osd: 2016-05-04 22:13:26.534657 7fb55c17e900 -1 #033[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-42: (2) No such file or directory#033[0m
May 4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@43.service: main process exited, code=exited, status=1/FAILURE
May 4 22:13:26 sm-cld-mtl-013 systemd: Unit ceph-osd@43.service entered failed state.
May 4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@43.service failed.
May 4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@42.service: main process exited, code=exited, status=1/FAILURE
May 4 22:13:26 sm-cld-mtl-013 systemd: Unit ceph-osd@42.service entered failed state.
May 4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@42.service failed.
-Ben
On Tue, May 3, 2016 at 7:16 PM, Wade Holler <wade.holler@xxxxxxxxx> wrote:
Hi Ben,What OS+Version ?Best Regards,WadeOn Tue, May 3, 2016 at 2:44 PM Ben Hines <bhines@xxxxxxxxx> wrote:_______________________________________________My crush map keeps putting some OSDs on the wrong node. Restarting them fixes it temporarily, but they eventually hop back to the other node that they aren't really on.Is there anything that can cause this to look for?Ceph 9.2.1-Ben
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com