Re: add hard drives to 3 CEPH servers (3 server cluster)

James Okken <James.Okken@xxxxxxxxxxxx> · Fri, 15 Dec 2017 18:11:29 +0000

Thanks again Cary,

Yes, once all the backfilling was done I was back to a Healthy cluster.
I moved on to the same steps for the next server in the cluster, it is backfilling now.
Once that is done I will do the last server in the cluster, and then I think I am done!

Just checking on one thing. I get these messages when running this command. I assume this is OK, right?
root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid 25c21708-f756-4593-bc9e-c5506622cf07
2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2017-12-15 17:28:22.856444 7fd2f9e928c0 -1 filestore(/var/lib/ceph/osd/ceph-4) could not find #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or directory
2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store /var/lib/ceph/osd/ceph-4 for osd.4 fsid 2b9f7957-d0db-481e-923e-89972f6c594f
2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file: /var/lib/ceph/osd/ceph-4/keyring: can't open /var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory
2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring /var/lib/ceph/osd/ceph-4/keyring

thanks

-----Original Message-----
From: Cary [mailto:dynamic.cary@xxxxxxxxx] 
Sent: Thursday, December 14, 2017 7:13 PM
To: James Okken
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  add hard drives to 3 CEPH servers (3 server cluster)

James,

 Usually once the misplaced data has balanced out the cluster should reach a healthy state. If you run a "ceph health detail" Ceph will show you some more detail about what is happening.  Is Ceph still recovering, or has it stalled? has the "objects misplaced (62.511%"
changed to a lower %?

Cary
-Dynamic

On Thu, Dec 14, 2017 at 10:52 PM, James Okken <James.Okken@xxxxxxxxxxxx> wrote:
> Thanks Cary!
>
> Your directions worked on my first sever. (once I found the missing carriage return in your list of commands, the email musta messed it up.
>
> For anyone else:
> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd 
> 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring really is 2 commands:
> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4  and ceph auth add osd.4 
> osd 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring
>
> Cary, what am I looking for in ceph -w and ceph -s to show the status of the data moving?
> Seems like the data is moving and that I have some issue...
>
> root@node-53:~# ceph -w
>     cluster 2b9f7957-d0db-481e-923e-89972f6c594f
>      health HEALTH_WARN
>             176 pgs backfill_wait
>             1 pgs backfilling
>             27 pgs degraded
>             1 pgs recovering
>             26 pgs recovery_wait
>             27 pgs stuck degraded
>             204 pgs stuck unclean
>             recovery 10322/84644 objects degraded (12.195%)
>             recovery 52912/84644 objects misplaced (62.511%)
>      monmap e3: 3 mons at {node-43=192.168.1.7:6789/0,node-44=192.168.1.5:6789/0,node-45=192.168.1.3:6789/0}
>             election epoch 138, quorum 0,1,2 node-45,node-44,node-43
>      osdmap e206: 4 osds: 4 up, 4 in; 177 remapped pgs
>             flags sortbitwise,require_jewel_osds
>       pgmap v3936175: 512 pgs, 5 pools, 333 GB data, 58184 objects
>             370 GB used, 5862 GB / 6233 GB avail
>             10322/84644 objects degraded (12.195%)
>             52912/84644 objects misplaced (62.511%)
>                  308 active+clean
>                  176 active+remapped+wait_backfill
>                   26 active+recovery_wait+degraded
>                    1 active+remapped+backfilling
>                    1 active+recovering+degraded recovery io 100605 
> kB/s, 14 objects/s
>   client io 0 B/s rd, 92788 B/s wr, 50 op/s rd, 11 op/s wr
>
> 2017-12-14 22:45:57.459846 mon.0 [INF] pgmap v3936174: 512 pgs: 1 
> activating, 1 active+recovering+degraded, 26 
> active+recovery_wait+degraded, 1 active+remapped+backfilling, 307 
> active+clean, 176 active+remapped+wait_backfill; 333 GB data, 369 GB 
> used, 5863 GB / 6233 GB avail; 0 B/s rd, 101107 B/s wr, 19 op/s; 
> 10354/84644 objects degraded (12.232%); 52912/84644 objects misplaced 
> (62.511%); 12224 kB/s, 2 objects/s recovering
> 2017-12-14 22:45:58.466736 mon.0 [INF] pgmap v3936175: 512 pgs: 1 
> active+recovering+degraded, 26 active+recovery_wait+degraded, 1 
> active+remapped+backfilling, 308 active+clean, 176 
> active+remapped+wait_backfill; 333 GB data, 370 GB used, 5862 GB / 
> 6233 GB avail; 0 B/s rd, 92788 B/s wr, 61 op/s; 10322/84644 objects 
> degraded (12.195%); 52912/84644 objects misplaced (62.511%); 100605 
> kB/s, 14 objects/s recovering
> 2017-12-14 22:46:00.474335 mon.0 [INF] pgmap v3936176: 512 pgs: 1 
> active+recovering+degraded, 26 active+recovery_wait+degraded, 1 
> active+remapped+backfilling, 308 active+clean, 176 
> active+remapped+wait_backfill; 333 GB data, 370 GB used, 5862 GB / 
> 6233 GB avail; 0 B/s rd, 434 kB/s wr, 45 op/s; 10322/84644 objects 
> degraded (12.195%); 52912/84644 objects misplaced (62.511%); 84234 
> kB/s, 10 objects/s recovering
> 2017-12-14 22:46:02.482228 mon.0 [INF] pgmap v3936177: 512 pgs: 1 
> active+recovering+degraded, 26 active+recovery_wait+degraded, 1 
> active+remapped+backfilling, 308 active+clean, 176 
> active+remapped+wait_backfill; 333 GB data, 370 GB used, 5862 GB / 
> 6233 GB avail; 0 B/s rd, 334 kB/s wr
>
>
> -----Original Message-----
> From: Cary [mailto:dynamic.cary@xxxxxxxxx]
> Sent: Thursday, December 14, 2017 4:21 PM
> To: James Okken
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  add hard drives to 3 CEPH servers (3 server 
> cluster)
>
> Jim,
>
> I am not an expert, but I believe I can assist.
>
>  Normally you will only have 1 OSD per drive. I have heard discussions about using multiple OSDs per disk, when using SSDs though.
>
>  Once your drives have been installed you will have to format them, unless you are using Bluestore. My steps for formatting are below.
> Replace the sXX with your drive name.
>
> parted -a optimal /dev/sXX
> print
> mklabel gpt
> unit mib
> mkpart OSD4sdd1 1 -1
> quit
> mkfs.xfs -f /dev/sXX1
>
> # Run blkid, and copy the UUID for the newly formatted drive.
> blkid
> # Add the mount point/UUID to fstab. The mount point will be created later.
> vi /etc/fstab
> # For example
> UUID=6386bac4-7fef-3cd2-7d64-13db51d83b12 /var/lib/ceph/osd/ceph-4 xfs
> rw,noatime,inode64,logbufs=8 0 0
>
>
> # You can then add the OSD to the cluster.
>
> uuidgen
> # Replace the UUID below with the UUID that was created with uuidgen.
> ceph osd create 23e734d7-96d8-4327-a2b9-0fbdc72ed8f1
>
> # Notice what number of osd it creates usually the lowest # OSD available.
>
> # Add osd.4 to ceph.conf on all Ceph nodes.
> vi /etc/ceph/ceph.conf
> ...
> [osd.4]
> public addr = 172.1.3.1
> cluster addr = 10.1.3.1
> ...
>
> # Now add the mount point.
> mkdir -p /var/lib/ceph/osd/ceph-4
> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4
>
> # The command below mounts everything in fstab.
> mount -a
> # The number after -i below needs changed to the correct OSD ID, and the osd-uuid needs to be changed the UUID created with uuidgen above.
> Your keyring location may be different and need changed as well.
> ceph-osd -i 4 --mkfs --mkkey --osd-uuid 
> 23e734d7-96d8-4327-a2b9-0fbdc72ed8f1
> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd 
> 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring
>
> # Add the new OSD to its host in the crush map.
> ceph osd crush add osd.4 .0 host=YOURhostNAME
>
> # Since the weight used in the previous step was .0, you will need to increase it. I use 1 for a 1TB drive and 5 for a 5TB drive. The command below will reweight osd.4 to 1. You may need to slowly ramp up this number. ie .10 then .20 etc.
> ceph osd crush reweight osd.4 1
>
> You should now be able to start the drive. You can watch the data move to the drive with a ceph -w. Once data has migrated to the drive, start the next.
>
> Cary
> -Dynamic
>
> On Thu, Dec 14, 2017 at 5:34 PM, James Okken <James.Okken@xxxxxxxxxxxx> wrote:
>> Hi all,
>>
>> Please let me know if I am missing steps or using the wrong steps
>>
>> I'm hoping to expand my small CEPH cluster by adding 4TB hard drives to each of the 3 servers in the cluster.
>>
>> I also need to change my replication factor from 1 to 3.
>> This is part of an Openstack environment deployed by Fuel and I had foolishly set my replication factor to 1 in the Fuel settings before deploy. I know this would have been done better at the beginning. I do want to keep the current cluster and not start over. I know this is going thrash my cluster for a while replicating, but there isn't too much data on it yet.
>>
>>
>> To start I need to safely turn off each CEPH server and add in the 4TB drive:
>> To do that I am going to run:
>> ceph osd set noout
>> systemctl stop ceph-osd@1 (or 2 or 3 on the other servers) ceph osd 
>> tree (to verify it is down) poweroff, install the 4TB drive, bootup 
>> again ceph osd unset noout
>>
>>
>>
>> Next step wouyld be to get CEPH to use the 4TB drives. Each CEPH server already has a 836GB OSD.
>>
>> ceph> osd df
>> ID WEIGHT  REWEIGHT SIZE  USE  AVAIL %USE  VAR  PGS
>>  0 0.81689  1.00000  836G 101G  734G 12.16 0.90 167
>>  1 0.81689  1.00000  836G 115G  721G 13.76 1.02 166
>>  2 0.81689  1.00000  836G 121G  715G 14.49 1.08 179
>>               TOTAL 2509G 338G 2171G 13.47 MIN/MAX VAR: 0.90/1.08
>> STDDEV: 0.97
>>
>> ceph> df
>> GLOBAL:
>>     SIZE      AVAIL     RAW USED     %RAW USED
>>     2509G     2171G         338G         13.47
>> POOLS:
>>     NAME        ID     USED     %USED     MAX AVAIL     OBJECTS
>>     rbd         0         0         0         2145G           0
>>     images      1      216G      9.15         2145G       27745
>>     backups     2         0         0         2145G           0
>>     volumes     3      114G      5.07         2145G       29717
>>     compute     4         0         0         2145G           0
>>
>>
>> Once I get the 4TB drive into each CEPH server should I look to increasing the current OSD (ie: to 4836GB)?
>> Or create a second 4000GB OSD on each CEPH server?
>> If I am going to create a second OSD on each CEPH server I hope to use this doc:
>> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/
>>
>>
>>
>> As far as changing the replication factor from 1 to 3:
>> Here are my pools now:
>>
>> ceph osd pool ls detail
>> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash 
>> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0 pool 1 'images' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 116 flags hashpspool stripe_width 0
>>         removed_snaps [1~3,b~6,12~8,20~2,24~6,2b~8,34~2,37~20]
>> pool 2 'backups' replicated size 1 min_size 1 crush_ruleset 0 
>> object_hash rjenkins pg_num 64 pgp_num 64 last_change 7 flags hashpspool stripe_width 0 pool 3 'volumes' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 73 flags hashpspool stripe_width 0
>>         removed_snaps [1~3]
>> pool 4 'compute' replicated size 1 min_size 1 crush_ruleset 0 
>> object_hash rjenkins pg_num 64 pgp_num 64 last_change 34 flags 
>> hashpspool stripe_width 0
>>
>> I plan on using these steps I saw online:
>> ceph osd pool set rbd size 3
>> ceph -s  (Verify that replication completes successfully) ceph osd 
>> pool set images size 3 ceph -s ceph osd pool set backups size 3 ceph 
>> -s ceph osd pool set volumes size 3 ceph -s
>>
>>
>> please let me know any advice or better methods...
>>
>> thanks
>>
>> --Jim
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com