Emperor Upgrade: osds not starting

Gagandeep Arora <aroragagan24@xxxxxxxxx> · Fri, 17 Jan 2014 08:06:15 +1000

Hello,
Osds are not starting on any of the nodes after I upgraded ceph-0.67.4 to emperor 0.72.2. Tried to start osd see the following verbose output. The same error comes up on all nodes when starting osds.

[root@ceph2 ~]# service ceph -v start osd.20
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "user"
=== osd.20 ===
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "run dir"

/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "pid file"
--- ceph2# mkdir -p /var/run/ceph
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "log dir"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "auto start"

--- ceph2# [ -e /var/run/ceph/osd.20.pid ] || exit 1   # no pid, presumably not running
        pid=`cat /var/run/ceph/osd.20.pid`
        [ -e /proc/$pid ] && grep -q ceph-osd /proc/$pid/cmdline && grep -qwe -i.20 /proc/$pid/cmdline && exit 0 # running

        exit 1  # pid is something else
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "copy executable to"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "osd data"

/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "fs path"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "devs"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "btrfs devs"

/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "lock file"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "admin socket"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "max open files"

/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "restart on core dump"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "valgrind"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "osd crush update on start"

/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "osd crush location"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "osd crush initial weight"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.20 "keyring"

--- ceph2# timeout 10 /usr/bin/ceph                     --name=osd.20                   --keyring=/var/lib/ceph/osd/ceph-20/keyring                     osd crush create-or-move                        --        20                       --- ceph2# df /var/lib/ceph/osd/ceph-20/. | tail -1 | awk '{ d= $2/1073741824 ; r = sprintf("%.2f", d); print r }'

0.45                    root=default                    host=ceph2
Invalid command:  --- doesn't represent a float
osd crush create-or-move <osdname (id|osd.id)> <float[0.0-]> <args> [<args>...] :  create entry or move existing entry for <name> <weight> at/to location <args>

Error EINVAL: invalid command
bash: line 1: 0.45: command not found
failed: 'timeout 10 /usr/bin/ceph                       --name=osd.20                   --keyring=/var/lib/ceph/osd/ceph-20/keyring                     osd crush create-or-move                        --        20                       --- ceph2# df /var/lib/ceph/osd/ceph-20/. | tail -1 | awk '{ d= $2/1073741824 ; r = sprintf("%.2f", d); print r }'

0.45                    root=default                    host=ceph2              

However, Osds start when I use ceph-osd -c /etc/ceph/ceph.conf -i <osdnum>  but not through service ceph or /etc/init.d/ceph. After I stared all the osds, ceph warning comes up with a message that a  "pool has too few pgs". I deleted the pool as there wasn't any important data in it. The same warning now comes up on a different pool.

[root@ceph1 ~]# ceph -s
    cluster c0459c67-e2cd-45f7-b580-dec1afc9dea5
     health HEALTH_WARN pool vmware-backups has too few pgs
     monmap e3: 3 mons at {a=192.168.6.101:6789/0,b=192.168.6.102:6789/0,c=192.168.6.103:6789/0}, election epoch 17684, quorum 0,1,2 a,b,c

     mdsmap e28128: 1/1/1 up {0=a=up:active}, 1 up:standby
     osdmap e7053: 30 osds: 30 up, 30 in
      pgmap v16242514: 6348 pgs, 12 pools, 9867 GB data, 2543 kobjects
            19775 GB used, 58826 GB / 78602 GB avail

                6343 active+clean
                   5 active+clean+scrubbing+deep
  client io 0 B/s rd, 617 kB/s wr, 81 op/s

[root@ceph1 ~]# ceph osd pool delete vmware-backups vmware-backups --yes-i-really-really-mean-it

pool 'vmware-backups' deleted
[root@ceph1 ~]# ceph -s
    cluster c0459c67-e2cd-45f7-b580-dec1afc9dea5
     health HEALTH_WARN pool centaur-backups has too few pgs
     monmap e3: 3 mons at {a=192.168.6.101:6789/0,b=192.168.6.102:6789/0,c=192.168.6.103:6789/0}, election epoch 17684, quorum 0,1,2 a,b,c

     mdsmap e28128: 1/1/1 up {0=a=up:active}, 1 up:standby
     osdmap e7054: 30 osds: 30 up, 30 in
      pgmap v16243076: 6048 pgs, 12 pools, 4437 GB data, 1181 kobjects
            19775 GB used, 58826 GB / 78602 GB avail

                6047 active+clean
                   1 active+clean+scrubbing+deep
  client io 54836 kB/s rd, 699 op/s

Regards,
Gagan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com