Re: osds down after upgrade hammer to jewel

Brian Andrus <brian.andrus@xxxxxxxxxxxxx> · Tue, 28 Mar 2017 06:53:15 -0700

Well, you said you were running v0.94.9, but are there any OSDs running pre-v0.94.4 as the error states?

On Tue, Mar 28, 2017 at 6:51 AM, Jaime Ibar <jaime@xxxxxxxxxxxx> wrote:

    On 28/03/17 14:41, Brian Andrus wrote:

        What does 
        # ceph tell osd.* version

    ceph tell osd.21 version

    Error ENXIO: problem getting command descriptions from osd.21

        reveal? Any pre-v0.94.4
            hammer OSDs running as the error states?

    Yes, as this is the first one I tried to upgrade.

    The other ones are running hammer

    Thanks

        On Tue, Mar 28, 2017 at 1:21 AM, Jaime
          Ibar <jaime@xxxxxxxxxxxx>
          wrote:

          Hi,

            I did change the ownership to user ceph. In fact, OSD
            processes are running

            ps aux | grep ceph

            ceph        2199  0.0  2.7 1729044 918792 ?      Ssl  Mar27 
             0:21 /usr/bin/ceph-osd --cluster=ceph -i 42 -f --setuser
            ceph --setgroup ceph

            ceph        2200  0.0  2.7 1721212 911084 ?      Ssl  Mar27 
             0:20 /usr/bin/ceph-osd --cluster=ceph -i 18 -f --setuser
            ceph --setgroup ceph

            ceph        2212  0.0  2.8 1732532 926580 ?      Ssl  Mar27 
             0:20 /usr/bin/ceph-osd --cluster=ceph -i 3 -f --setuser
            ceph --setgroup ceph

            ceph        2215  0.0  2.8 1743552 935296 ?      Ssl  Mar27 
             0:20 /usr/bin/ceph-osd --cluster=ceph -i 47 -f --setuser
            ceph --setgroup ceph

            ceph        2341  0.0  2.7 1715548 908312 ?      Ssl  Mar27 
             0:20 /usr/bin/ceph-osd --cluster=ceph -i 51 -f --setuser
            ceph --setgroup ceph

            ceph        2383  0.0  2.7 1694944 893768 ?      Ssl  Mar27 
             0:20 /usr/bin/ceph-osd --cluster=ceph -i 56 -f --setuser
            ceph --setgroup ceph

            [...]

            If I run one of the osd increasing debug

            ceph-osd --debug_osd 5 -i 31

            this is what I get in logs

            [...]

            0 osd.31 14016 done with init, starting boot process

            2017-03-28 09:19:15.280182 7f083df0c800  1 osd.31 14016 We
            are healthy, booting

            2017-03-28 09:19:15.280685 7f081cad3700  1 osd.31 14016
            osdmap indicates one or more pre-v0.94.4 hammer OSDs is
            running

            [...]

            It seems the osd is running but ceph is not aware of it

            Thanks

                Jaime

                On 27/03/17 21:56, George Mihaiescu wrote:

                  Make sure the OSD processes on the Jewel node are
                  running. If you didn't change the ownership to user
                  ceph, they won't start.

                    On Mar 27, 2017, at 11:53, Jaime Ibar <jaime@xxxxxxxxxxxx>
                    wrote:

                    Hi all,

                    I'm upgrading ceph cluster from Hammer 0.94.9 to
                    jewel 10.2.6.

                    The ceph cluster has 3 servers (one mon and one mds
                    each) and another 6 servers with

                    12 osds each.

                    The monitoring and mds have been succesfully
                    upgraded to latest jewel release, however

                    after upgrade the first osd server(12 osds), ceph is
                    not aware of them and

                    are marked as down

                    ceph -s

                    cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45

                         health HEALTH_WARN

                    [...]

                                12/72 in osds are down

                                noout flag(s) set

                         osdmap e14010: 72 osds: 60 up, 72 in; 14641
                    remapped pgs

                                flags noout

                    [...]

                    ceph osd tree

                    3   3.64000         osd.3          down  1.00000
                    1.00000

                    8   3.64000         osd.8          down  1.00000
                    1.00000

                    14   3.64000         osd.14         down  1.00000
                    1.00000

                    18   3.64000         osd.18         down  1.00000   
                          1.00000

                    21   3.64000         osd.21         down  1.00000   
                          1.00000

                    28   3.64000         osd.28         down  1.00000   
                          1.00000

                    31   3.64000         osd.31         down  1.00000   
                          1.00000

                    37   3.64000         osd.37         down  1.00000   
                          1.00000

                    42   3.64000         osd.42         down  1.00000   
                          1.00000

                    47   3.64000         osd.47         down  1.00000   
                          1.00000

                    51   3.64000         osd.51         down  1.00000   
                          1.00000

                    56   3.64000         osd.56         down  1.00000   
                          1.00000

                    If I run this command with one of the down osd

                    ceph osd in 14

                    osd.14 is already in.

                    however ceph doesn't mark it as up and the cluster
                    health remains

                    in degraded state.

                    Do I have to upgrade all the osds to jewel first?

                    Any help as I'm running out of ideas?

                    Thanks

                    Jaime

                    -- 

                    Jaime Ibar

                    High Performance & Research Computing, IS
                    Services

                    Lloyd Building, Trinity College Dublin, Dublin 2,
                    Ireland.

                    http://www.tchpc.tcd.ie/ | jaime@xxxxxxxxxxxx

                    Tel: +353-1-896-3725

                    _______________________________________________

                    ceph-users mailing list

                    ceph-users@xxxxxxxxxxxxxx

                    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                -- 

                Jaime Ibar

                High Performance & Research Computing, IS Services

                Lloyd Building, Trinity College Dublin, Dublin 2,
                Ireland.

                http://www.tchpc.tcd.ie/ | jaime@xxxxxxxxxxxx

                Tel: +353-1-896-3725

                _______________________________________________

                ceph-users mailing list

                ceph-users@xxxxxxxxxxxxxx

                http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

        -- 

                Brian Andrus | Cloud Systems Engineer |
                  DreamHost
                brian.andrus@xxxxxxxxxxxxx | www.dreamhost.com

    -- 

Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | jaime@xxxxxxxxxxxx
Tel: +353-1-896-3725

-- 
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.andrus@xxxxxxxxxxxxx | www.dreamhost.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com