Re: osds down after upgrade hammer to jewel

Brian Andrus <brian.andrus@xxxxxxxxxxxxx> · Tue, 28 Mar 2017 06:41:48 -0700

What does 
# ceph tell osd.* version

reveal? Any pre-v0.94.4 hammer OSDs running as the error states?

On Tue, Mar 28, 2017 at 1:21 AM, Jaime Ibar <jaime@xxxxxxxxxxxx> wrote:
Hi,

I did change the ownership to user ceph. In fact, OSD processes are running

ps aux | grep ceph

ceph        2199  0.0  2.7 1729044 918792 ?      Ssl  Mar27   0:21 /usr/bin/ceph-osd --cluster=ceph -i 42 -f --setuser ceph --setgroup ceph

ceph        2200  0.0  2.7 1721212 911084 ?      Ssl  Mar27   0:20 /usr/bin/ceph-osd --cluster=ceph -i 18 -f --setuser ceph --setgroup ceph

ceph        2212  0.0  2.8 1732532 926580 ?      Ssl  Mar27   0:20 /usr/bin/ceph-osd --cluster=ceph -i 3 -f --setuser ceph --setgroup ceph

ceph        2215  0.0  2.8 1743552 935296 ?      Ssl  Mar27   0:20 /usr/bin/ceph-osd --cluster=ceph -i 47 -f --setuser ceph --setgroup ceph

ceph        2341  0.0  2.7 1715548 908312 ?      Ssl  Mar27   0:20 /usr/bin/ceph-osd --cluster=ceph -i 51 -f --setuser ceph --setgroup ceph

ceph        2383  0.0  2.7 1694944 893768 ?      Ssl  Mar27   0:20 /usr/bin/ceph-osd --cluster=ceph -i 56 -f --setuser ceph --setgroup ceph

[...]

If I run one of the osd increasing debug

ceph-osd --debug_osd 5 -i 31

this is what I get in logs

[...]

0 osd.31 14016 done with init, starting boot process

2017-03-28 09:19:15.280182 7f083df0c800  1 osd.31 14016 We are healthy, booting

2017-03-28 09:19:15.280685 7f081cad3700  1 osd.31 14016 osdmap indicates one or more pre-v0.94.4 hammer OSDs is running

[...]

It seems the osd is running but ceph is not aware of it

Thanks

Jaime

On 27/03/17 21:56, George Mihaiescu wrote:

Make sure the OSD processes on the Jewel node are running. If you didn't change the ownership to user ceph, they won't start.

On Mar 27, 2017, at 11:53, Jaime Ibar <jaime@xxxxxxxxxxxx> wrote:

Hi all,

I'm upgrading ceph cluster from Hammer 0.94.9 to jewel 10.2.6.

The ceph cluster has 3 servers (one mon and one mds each) and another 6 servers with

12 osds each.

The monitoring and mds have been succesfully upgraded to latest jewel release, however

after upgrade the first osd server(12 osds), ceph is not aware of them and

are marked as down

ceph -s

cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45

     health HEALTH_WARN

[...]

            12/72 in osds are down

            noout flag(s) set

     osdmap e14010: 72 osds: 60 up, 72 in; 14641 remapped pgs

            flags noout

[...]

ceph osd tree

3   3.64000         osd.3          down  1.00000 1.00000

8   3.64000         osd.8          down  1.00000 1.00000

14   3.64000         osd.14         down  1.00000 1.00000

18   3.64000         osd.18         down  1.00000          1.00000

21   3.64000         osd.21         down  1.00000          1.00000

28   3.64000         osd.28         down  1.00000          1.00000

31   3.64000         osd.31         down  1.00000          1.00000

37   3.64000         osd.37         down  1.00000          1.00000

42   3.64000         osd.42         down  1.00000          1.00000

47   3.64000         osd.47         down  1.00000          1.00000

51   3.64000         osd.51         down  1.00000          1.00000

56   3.64000         osd.56         down  1.00000          1.00000

If I run this command with one of the down osd

ceph osd in 14

osd.14 is already in.

however ceph doesn't mark it as up and the cluster health remains

in degraded state.

Do I have to upgrade all the osds to jewel first?

Any help as I'm running out of ideas?

Thanks

Jaime

-- 

Jaime Ibar

High Performance & Research Computing, IS Services

Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.

http://www.tchpc.tcd.ie/ | jaime@xxxxxxxxxxxx

Tel: +353-1-896-3725

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

Jaime Ibar

High Performance & Research Computing, IS Services

Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.

http://www.tchpc.tcd.ie/ | jaime@xxxxxxxxxxxx

Tel: +353-1-896-3725

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.andrus@xxxxxxxxxxxxx | www.dreamhost.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com