Re: 答复: 答复: 答复: 答复: 答复: how to transfer ceph cluster from the old network-and-hosts to a new one

Дробышевский, Владимир <vlad@xxxxxxxxxx> · Wed, 27 Jul 2016 11:17:55 +0500

Hi!
  To have all PGs in active+clean state you need to have enough nodes and OSDs to hold all PG copies (depends on you pools size). If your pools have size 3 (by default) then you need 3 nodes with enough OSDs\space alive. 

  If you want to migrate from the old hardware to a new then I would recommend you to connect a new node to the cluster, bring up all ODSs on this node and then remove OSDs one by one (waiting for relocation finished) from one of the old nodes. Then repeat with another old\new nodes pair and so on until you finish your migration.

Best regards,
Vladimir

С уважением,
Дробышевский Владимир                                      
Компания "АйТи Город"
+7 343 2222192

Аппаратное и программное обеспечение
IBM, Microsoft, Eset
Поставка проектов "под ключ"
Аутсорсинг ИТ-услуг

2016-07-27 7:32 GMT+05:00 朱 彤 <besthopeall@xxxxxxxxxxx>:

@Владимир I'll try that thanks.

Now when I remove the old OSD, pg remapped but stuck. http://docs.ceph.com/docs/jewel/rados/operations/add-or-rm-osds/ describes
 this as a "CRUSH corner case where some PGs remain stuck in the active+remapped state".

I have tried:
ceph osd crush reweight osd.8 0, and I can also see that in ceph osd tree it's weight is 0, but the cluster is still stuck. Before this, it was active+clean.

# ceph osd tree
ID WEIGHT  TYPE NAME           UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.02939 root default

-2 0.01469     host ceph-node1
 0 0.00490         osd.0            up  1.00000          1.00000
 1 0.00490         osd.1            up  1.00000          1.00000
 2 0.00490         osd.2            up  1.00000          1.00000
-3       0     host ceph-node3
 8       0         osd.8            up  1.00000          1.00000

-4 0.01469     host ceph-node2
 5 0.00490         osd.5            up  1.00000          1.00000
 6 0.00490         osd.6            up  1.00000          1.00000
 7 0.00490         osd.7            up  1.00000          1.00000
# ceph status
    cluster eee6caf2-a7c6-411c-8711-a87aa4a66bf2
     health HEALTH_WARN
            clock skew detected on mon.ceph-node3
            64 pgs stuck unclean
            too few PGs per OSD (18 < min 30)

            Monitor clock skew detected
     monmap e2: 2 mons at {ceph-node1=192.168.57.101:6789/0,ceph-node3=192.168.5                                                                                                                                                             7.103:6789/0}
            election epoch 48, quorum 0,1 ceph-node1,ceph-node3
     osdmap e134: 7 osds: 7 up, 7 in; 64 remapped pgs
            flags sortbitwise
      pgmap v1387: 64 pgs, 1 pools, 0 bytes data, 0 objects
            260 MB used, 35502 MB / 35762 MB avail
                  41 active+remapped
                  23 active

thanks.

发件人: vlad@xxxxxxxxxx <vlad@xxxxxxxxxx> 代表 Владимир Дробышевский <v.heathen@xxxxxxxxx>

发送时间: 2016年7月26日 10:08:30

收件人: 朱 彤

抄送: ceph-users@xxxxxxxxxxxxxx

主题: Re:  答复: 答复: 答复: 答复: how to transfer ceph cluster from the old network-and-hosts to a new one

Hello!

  As far as I know, 'admin node' is just a node with ceph-deploy and initial config/keys directory (if I'm wrong somebody will corrent me, I hope). So you need just to install ceph-deploy (if you are going to use it futher) and move the ceph user's cluster
 config/keys directory to a new node. In any other senses nodes should be equal.

Best regards, 
Vladimir

С уважением,

Дробышевский Владимир                                      

Компания "АйТи Город"

+7 343 2222192

ICQ# - 1921011

Аппаратное и программное обеспечение

IBM, Microsoft, Eset, Яндекс

Поставка проектов "под ключ"

Аутсорсинг ИТ-услуг

2016-07-26 13:08 GMT+05:00 朱 彤 <besthopeall@xxxxxxxxxxx>:

Thanks! the cluster becomes active+clean again. 

Basically this proves OSD and MON could be transferred using rebalance. What about the admin node? After adding new OSD and MON, shutdown old OSD and MON, should I also set up a new admin and "turn off" the new one?

发件人:
vlad@xxxxxxxxxx <vlad@xxxxxxxxxx> 代表 Владимир Дробышевский <v.heathen@xxxxxxxxx>

发送时间: 2016年7月26日 6:05:05

收件人: 朱 彤

抄送: ceph-users@xxxxxxxxxxxxxx

主题: Re:  答复: 答复: 答复: how to transfer ceph cluster from the old network-and-hosts to a new one

Hi!

  As I can see from here: 

  osdmap e99: 9 osds: 9 up, 6 in; 64 remapped pgs

  that you have 3 OSDs down. Seems that you've tried to remove all of your ceph-node3 OSDs at once. 

 This

   53 active+remapped

   48 active+undersized+degraded

  means that you have degraded objects: I would say that you have pools with size=3, so since you've lost all OSDs on one of your nodes (1/3 of all of your OSDs) now
 you have degraded pools with 2 PG copies only instead of 3.

Besides, systemctl status ceph-osd@3
 and systemctl status ceph-osd@4 give the same result:  Active: inactive (dead)
Probably because both aren't running. You can try 'ps ax | grep osd' and check output.

Best regards,
Vladimir

С уважением,

Дробышевский Владимир                                      

Компания "АйТи Город"

+7 343 2222192

ICQ# - 1921011

Аппаратное и программное обеспечение

IBM, Microsoft, Eset, Яндекс

Поставка проектов "под ключ"

Аутсорсинг ИТ-услуг

2016-07-26 10:11 GMT+05:00 朱 彤 <besthopeall@xxxxxxxxxxx>:

Now the service could be found, thanks. However, both ceph osd tree and ceph status show osd.3 is still up, although ceph status also shows "degraded, stuck unclean..." I think is because of clock skew on a second MON. 

Besides, systemctl status ceph-osd@3 and systemctl
 status ceph-osd@4 give the same result:  Active: inactive (dead)

[root@ceph-node1 ~]# systemctl stop ceph-osd@3
[root@ceph-node1 ~]# ceph osd tree
ID WEIGHT  TYPE NAME           UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.04408 root default
-2 0.01469     host ceph-node1
 0 0.00490         osd.0            up  1.00000          1.00000
 1 0.00490         osd.1            up  1.00000          1.00000
 2 0.00490         osd.2            up  1.00000          1.00000
-3 0.01469     host ceph-node3
 3 0.00490         osd.3            up        0          1.00000
 4 0.00490         osd.4            up        0          1.00000
 8 0.00490         osd.8            up        0          1.00000
-4 0.01469     host ceph-node2
 5 0.00490         osd.5            up  1.00000          1.00000
 6 0.00490         osd.6            up  1.00000          1.00000
 7 0.00490         osd.7            up  1.00000          1.00000
[root@ceph-node1 ~]# ceph status
    cluster eee6caf2-a7c6-411c-8711-a87aa4a66bf2
     health HEALTH_WARN
            clock skew detected on mon.ceph-node3
            48 pgs degraded
            112 pgs stuck unclean
            48 pgs undersized
            recovery 342/513 objects degraded (66.667%)
            Monitor clock skew detected
     monmap e2: 2 mons at {ceph-node1=192.168.57.101:6789/0,ceph-node3=192.168.57.103:6789/0}
            election epoch 44, quorum 0,1 ceph-node1,ceph-node3
     osdmap e99: 9 osds: 9 up, 6 in; 64 remapped pgs
            flags sortbitwise
      pgmap v45477: 112 pgs, 7 pools, 1636 bytes data, 171 objects
            224 MB used, 30429 MB / 30653 MB avail
            342/513 objects degraded (66.667%)
                  53 active+remapped
                  48 active+undersized+degraded
                  11 active

[root@ceph-node1 ~]# systemctl status ceph-osd@3
● ceph-osd@3.service - Ceph object storage daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

发件人:
vlad@xxxxxxxxxx <vlad@xxxxxxxxxx> 代表 Владимир Дробышевский <v.heathen@xxxxxxxxx>

发送时间: 2016年7月26日 4:47:14

收件人: 朱 彤

抄送: ceph-users@xxxxxxxxxxxxxx

主题: Re:  答复: 答复: how to transfer ceph cluster from the old network-and-hosts to a new one

Hi!

  You should use ceph-osd@<num> as a service name, not ceph, and systemctl as a service control utility.

  For example, 'systemctl stop ceph-osd@3'

Best regards,
Vladimir

С уважением,

Дробышевский Владимир                                      

Компания "АйТи Город"

+7 343 2222192

ICQ# - 1921011

Аппаратное и программное обеспечение

IBM, Microsoft, Eset, Яндекс

Поставка проектов "под ключ"

Аутсорсинг ИТ-услуг

2016-07-26 7:11 GMT+05:00 朱 彤 <besthopeall@xxxxxxxxxxx>:

@Дробышевский thanks, I have tried, but 

# service ceph status
Redirecting to /bin/systemctl status  ceph.service
● ceph.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

compared to another one having /etc/init.d/ceph and CentOS7 as well.

$ service ceph status
=== osd.6 ===
osd.6: running failed: '/usr/bin/ceph --admin-daemon /var/run/ceph/ceph-osd.6.asok version 2>/dev/null'

They have the same OS, but one has /etc/init.d/ceph, the other one doesn't. Probably I skipped some steps by mistake.
Since /etc/init.d/ceph is a script, can I just copy one to have it work?

发件人: Дробышевский, Владимир <vlad@xxxxxxxxxx>

发送时间: 2016年7月25日 12:43:54

收件人: 朱 彤

抄送: ceph-users@xxxxxxxxxxxxxx

主题: Re:  答复: how to transfer ceph cluster from the old network-and-hosts to a new one

Hi!

  CentOS 7 is using systemd, so you should stop osd with 'systemctl stop ceph-osd@<num>'

Best regards,
Vladimir

С уважением,

Дробышевский Владимир                                      

Компания "АйТи Город"

+7 343 2222192

Аппаратное и программное обеспечение

IBM, Microsoft, Eset

Поставка проектов "под ключ"

Аутсорсинг ИТ-услуг

2016-07-25 13:44 GMT+05:00 朱 彤 <besthopeall@xxxxxxxxxxx>:

@Henrik Korkuc thanks for the tip, operating on it. In order to stop OSD, i need to run /etc/init.d/ceph stop osd.num, but just noticed /etc/init.d/ceph is missing. No such directory or file.

 I used ceph-deploy to install the cluster on Centos 7. Any idea?

Thanks!

发件人: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> 代表 Henrik Korkuc <lists@xxxxxxxxx>

发送时间: 2016年7月25日 8:03:34

收件人: ceph-users@xxxxxxxxxxxxxx

主题: Re:  how to transfer ceph cluster from the old network-and-hosts to a new one

On 16-07-25 10:55, 朱 彤 wrote:

Hi all,

I m looking for a method to transfer ceph cluster. 

Now the cluster is located in network1 that has hosts A, B, C...

And the target is to transfer it to network2 that has hosts a,b,c...

What I can think of, is adding hosts a, b, c into the current cluster like adding OSD and MON. Then after the data has been rebalanced, down OSD and MON on hosts A,B,C

Then the question would be how to know the old OSD could be safely down?

This method causes too much redundant operations, other than creating OSD or MON, in the new environment, should I also create PGs and POOLs just like the old cluster has?

Is there a more direct way to shift cluster from old network and hosts to a new one?

Hey,

please refer to recent post named "change of dns names and IP addresses of cluster members" in this mailing list. If both networks are interconnected then migration would be quite easy

Thanks!

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com