Re: osds down after upgrade hammer to jewel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Well, you said you were running v0.94.9, but are there any OSDs running pre-v0.94.4 as the error states?

On Tue, Mar 28, 2017 at 6:51 AM, Jaime Ibar <jaime@xxxxxxxxxxxx> wrote:



On 28/03/17 14:41, Brian Andrus wrote:
What does 
# ceph tell osd.* version
ceph tell osd.21 version
Error ENXIO: problem getting command descriptions from osd.21

reveal? Any pre-v0.94.4 hammer OSDs running as the error states?
Yes, as this is the first one I tried to upgrade.
The other ones are running hammer

Thanks



On Tue, Mar 28, 2017 at 1:21 AM, Jaime Ibar <jaime@xxxxxxxxxxxx> wrote:
Hi,

I did change the ownership to user ceph. In fact, OSD processes are running

ps aux | grep ceph
ceph        2199  0.0  2.7 1729044 918792 ?      Ssl  Mar27   0:21 /usr/bin/ceph-osd --cluster=ceph -i 42 -f --setuser ceph --setgroup ceph
ceph        2200  0.0  2.7 1721212 911084 ?      Ssl  Mar27   0:20 /usr/bin/ceph-osd --cluster=ceph -i 18 -f --setuser ceph --setgroup ceph
ceph        2212  0.0  2.8 1732532 926580 ?      Ssl  Mar27   0:20 /usr/bin/ceph-osd --cluster=ceph -i 3 -f --setuser ceph --setgroup ceph
ceph        2215  0.0  2.8 1743552 935296 ?      Ssl  Mar27   0:20 /usr/bin/ceph-osd --cluster=ceph -i 47 -f --setuser ceph --setgroup ceph
ceph        2341  0.0  2.7 1715548 908312 ?      Ssl  Mar27   0:20 /usr/bin/ceph-osd --cluster=ceph -i 51 -f --setuser ceph --setgroup ceph
ceph        2383  0.0  2.7 1694944 893768 ?      Ssl  Mar27   0:20 /usr/bin/ceph-osd --cluster=ceph -i 56 -f --setuser ceph --setgroup ceph
[...]

If I run one of the osd increasing debug

ceph-osd --debug_osd 5 -i 31

this is what I get in logs

[...]

0 osd.31 14016 done with init, starting boot process
2017-03-28 09:19:15.280182 7f083df0c800  1 osd.31 14016 We are healthy, booting
2017-03-28 09:19:15.280685 7f081cad3700  1 osd.31 14016 osdmap indicates one or more pre-v0.94.4 hammer OSDs is running
[...]

It seems the osd is running but ceph is not aware of it

Thanks
Jaime




On 27/03/17 21:56, George Mihaiescu wrote:
Make sure the OSD processes on the Jewel node are running. If you didn't change the ownership to user ceph, they won't start.


On Mar 27, 2017, at 11:53, Jaime Ibar <jaime@xxxxxxxxxxxx> wrote:

Hi all,

I'm upgrading ceph cluster from Hammer 0.94.9 to jewel 10.2.6.

The ceph cluster has 3 servers (one mon and one mds each) and another 6 servers with
12 osds each.
The monitoring and mds have been succesfully upgraded to latest jewel release, however
after upgrade the first osd server(12 osds), ceph is not aware of them and
are marked as down

ceph -s

cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45
     health HEALTH_WARN
[...]
            12/72 in osds are down
            noout flag(s) set
     osdmap e14010: 72 osds: 60 up, 72 in; 14641 remapped pgs
            flags noout
[...]

ceph osd tree

3   3.64000         osd.3          down  1.00000 1.00000
8   3.64000         osd.8          down  1.00000 1.00000
14   3.64000         osd.14         down  1.00000 1.00000
18   3.64000         osd.18         down  1.00000          1.00000
21   3.64000         osd.21         down  1.00000          1.00000
28   3.64000         osd.28         down  1.00000          1.00000
31   3.64000         osd.31         down  1.00000          1.00000
37   3.64000         osd.37         down  1.00000          1.00000
42   3.64000         osd.42         down  1.00000          1.00000
47   3.64000         osd.47         down  1.00000          1.00000
51   3.64000         osd.51         down  1.00000          1.00000
56   3.64000         osd.56         down  1.00000          1.00000

If I run this command with one of the down osd
ceph osd in 14
osd.14 is already in.
however ceph doesn't mark it as up and the cluster health remains
in degraded state.

Do I have to upgrade all the osds to jewel first?
Any help as I'm running out of ideas?

Thanks
Jaime

--

Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | jaime@xxxxxxxxxxxx
Tel: +353-1-896-3725

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | jaime@xxxxxxxxxxxx
Tel: +353-1-896-3725

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Brian Andrus | Cloud Systems Engineer | DreamHost

-- 

Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | jaime@xxxxxxxxxxxx
Tel: +353-1-896-3725



--
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.andrus@xxxxxxxxxxxxx | www.dreamhost.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux