Re: add hard drives to 3 CEPH servers (3 server cluster)

Ronny Aasen <ronny+ceph-users@xxxxxxxx> · Fri, 15 Dec 2017 23:11:56 +0100

if you have a global setting in ceph.conf it will only affect the 
creation of new pools. i reccomend using the default
size:3 + min_size:2

also check your pools that you have min_size=2

kind regards
Ronny Aasen

On 15.12.2017 23:00, James Okken wrote:
This whole effort went extremely well, thanks to Cary, and Im not used to that with CEPH so far. (And openstack ever....)
Thank you Cary.

Ive upped the replication factor and now I see "replicated size 3" in each of my pools. Is this the only place to check replication level? Is there a Global setting or only a setting per Pool?

ceph osd pool ls detail
pool 0 'rbd' replicated size 3......
pool 1 'images' replicated size 3...
...

One last question!
At this replication level how can I tell how much total space I actually have now?
Do I just 1/3 the Global size?

ceph df
GLOBAL:
     SIZE       AVAIL      RAW USED     %RAW USED
     13680G     12998G         682G          4.99
POOLS:
     NAME        ID     USED     %USED     MAX AVAIL     OBJECTS
     rbd         0         0         0         6448G           0
     images      1      216G      3.24         6448G       27745
     backups     2         0         0         6448G           0
     volumes     3      117G      1.79         6448G       30441
     compute     4         0         0         6448G           0

ceph osd df
ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE VAR  PGS
  0 0.81689  1.00000   836G 36549M   800G 4.27 0.86  67
  4 3.70000  1.00000  3723G   170G  3553G 4.58 0.92 270
  1 0.81689  1.00000   836G 49612M   788G 5.79 1.16  56
  5 3.70000  1.00000  3723G   192G  3531G 5.17 1.04 282
  2 0.81689  1.00000   836G 33639M   803G 3.93 0.79  58
  3 3.70000  1.00000  3723G   202G  3521G 5.43 1.09 291
               TOTAL 13680G   682G 12998G 4.99
MIN/MAX VAR: 0.79/1.16  STDDEV: 0.67

Thanks!

-----Original Message-----
From: Cary [mailto:dynamic.cary@xxxxxxxxx]
Sent: Friday, December 15, 2017 4:05 PM
To: James Okken
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  add hard drives to 3 CEPH servers (3 server cluster)

James,

  Those errors are normal. Ceph creates the missing files. You can check "/var/lib/ceph/osd/ceph-6", before and after you run those commands to see what files are added there.

  Make sure you get the replication factor set.

Cary
-Dynamic

On Fri, Dec 15, 2017 at 6:11 PM, James Okken <James.Okken@xxxxxxxxxxxx> wrote:
Thanks again Cary,

Yes, once all the backfilling was done I was back to a Healthy cluster.
I moved on to the same steps for the next server in the cluster, it is backfilling now.
Once that is done I will do the last server in the cluster, and then I think I am done!

Just checking on one thing. I get these messages when running this command. I assume this is OK, right?
root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid
25c21708-f756-4593-bc9e-c5506622cf07
2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open:
disabling aio for non-block journal.  Use journal_force_aio to force
use of aio anyway
2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open:
disabling aio for non-block journal.  Use journal_force_aio to force
use of aio anyway
2017-12-15 17:28:22.856444 7fd2f9e928c0 -1
filestore(/var/lib/ceph/osd/ceph-4) could not find
#-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or
directory
2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store
/var/lib/ceph/osd/ceph-4 for osd.4 fsid
2b9f7957-d0db-481e-923e-89972f6c594f
2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file:
/var/lib/ceph/osd/ceph-4/keyring: can't open
/var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory
2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring
/var/lib/ceph/osd/ceph-4/keyring

thanks

-----Original Message-----
From: Cary [mailto:dynamic.cary@xxxxxxxxx]
Sent: Thursday, December 14, 2017 7:13 PM
To: James Okken
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  add hard drives to 3 CEPH servers (3 server
cluster)

James,

  Usually once the misplaced data has balanced out the cluster should reach a healthy state. If you run a "ceph health detail" Ceph will show you some more detail about what is happening.  Is Ceph still recovering, or has it stalled? has the "objects misplaced (62.511%"
changed to a lower %?

Cary
-Dynamic

On Thu, Dec 14, 2017 at 10:52 PM, James Okken <James.Okken@xxxxxxxxxxxx> wrote:
Thanks Cary!

Your directions worked on my first sever. (once I found the missing carriage return in your list of commands, the email musta messed it up.

For anyone else:
chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd
'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring really is 2 commands:
chown -R ceph:ceph /var/lib/ceph/osd/ceph-4  and ceph auth add osd.4
osd 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring

Cary, what am I looking for in ceph -w and ceph -s to show the status of the data moving?
Seems like the data is moving and that I have some issue...

root@node-53:~# ceph -w
     cluster 2b9f7957-d0db-481e-923e-89972f6c594f
      health HEALTH_WARN
             176 pgs backfill_wait
             1 pgs backfilling
             27 pgs degraded
             1 pgs recovering
             26 pgs recovery_wait
             27 pgs stuck degraded
             204 pgs stuck unclean
             recovery 10322/84644 objects degraded (12.195%)
             recovery 52912/84644 objects misplaced (62.511%)
      monmap e3: 3 mons at {node-43=192.168.1.7:6789/0,node-44=192.168.1.5:6789/0,node-45=192.168.1.3:6789/0}
             election epoch 138, quorum 0,1,2 node-45,node-44,node-43
      osdmap e206: 4 osds: 4 up, 4 in; 177 remapped pgs
             flags sortbitwise,require_jewel_osds
       pgmap v3936175: 512 pgs, 5 pools, 333 GB data, 58184 objects
             370 GB used, 5862 GB / 6233 GB avail
             10322/84644 objects degraded (12.195%)
             52912/84644 objects misplaced (62.511%)
                  308 active+clean
                  176 active+remapped+wait_backfill
                   26 active+recovery_wait+degraded
                    1 active+remapped+backfilling
                    1 active+recovering+degraded recovery io 100605
kB/s, 14 objects/s
   client io 0 B/s rd, 92788 B/s wr, 50 op/s rd, 11 op/s wr

2017-12-14 22:45:57.459846 mon.0 [INF] pgmap v3936174: 512 pgs: 1
activating, 1 active+recovering+degraded, 26
active+recovery_wait+degraded, 1 active+remapped+backfilling, 307
active+clean, 176 active+remapped+wait_backfill; 333 GB data, 369 GB
used, 5863 GB / 6233 GB avail; 0 B/s rd, 101107 B/s wr, 19 op/s;
10354/84644 objects degraded (12.232%); 52912/84644 objects misplaced
(62.511%); 12224 kB/s, 2 objects/s recovering
2017-12-14 22:45:58.466736 mon.0 [INF] pgmap v3936175: 512 pgs: 1
active+recovering+degraded, 26 active+recovery_wait+degraded, 1
active+remapped+backfilling, 308 active+clean, 176 wait_backfill; 333
active+remapped+GB data, 370 GB used, 5862 GB /
6233 GB avail; 0 B/s rd, 92788 B/s wr, 61 op/s; 10322/84644 objects
degraded (12.195%); 52912/84644 objects misplaced (62.511%); 100605
kB/s, 14 objects/s recovering
2017-12-14 22:46:00.474335 mon.0 [INF] pgmap v3936176: 512 pgs: 1
active+recovering+degraded, 26 active+recovery_wait+degraded, 1
active+remapped+backfilling, 308 active+clean, 176 wait_backfill; 333
active+remapped+GB data, 370 GB used, 5862 GB /
6233 GB avail; 0 B/s rd, 434 kB/s wr, 45 op/s; 10322/84644 objects
degraded (12.195%); 52912/84644 objects misplaced (62.511%); 84234
kB/s, 10 objects/s recovering
2017-12-14 22:46:02.482228 mon.0 [INF] pgmap v3936177: 512 pgs: 1
active+recovering+degraded, 26 active+recovery_wait+degraded, 1
active+remapped+backfilling, 308 active+clean, 176 wait_backfill; 333
active+remapped+GB data, 370 GB used, 5862 GB /
6233 GB avail; 0 B/s rd, 334 kB/s wr

-----Original Message-----
From: Cary [mailto:dynamic.cary@xxxxxxxxx]
Sent: Thursday, December 14, 2017 4:21 PM
To: James Okken
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  add hard drives to 3 CEPH servers (3 server
cluster)

Jim,

I am not an expert, but I believe I can assist.

  Normally you will only have 1 OSD per drive. I have heard discussions about using multiple OSDs per disk, when using SSDs though.

  Once your drives have been installed you will have to format them, unless you are using Bluestore. My steps for formatting are below.
Replace the sXX with your drive name.

parted -a optimal /dev/sXX
print
mklabel gpt
unit mib
mkpart OSD4sdd1 1 -1
quit
mkfs.xfs -f /dev/sXX1

# Run blkid, and copy the UUID for the newly formatted drive.
blkid
# Add the mount point/UUID to fstab. The mount point will be created later.
vi /etc/fstab
# For example
UUID=6386bac4-7fef-3cd2-7d64-13db51d83b12 /var/lib/ceph/osd/ceph-4
xfs
rw,noatime,inode64,logbufs=8 0 0

# You can then add the OSD to the cluster.

uuidgen
# Replace the UUID below with the UUID that was created with uuidgen.
ceph osd create 23e734d7-96d8-4327-a2b9-0fbdc72ed8f1

# Notice what number of osd it creates usually the lowest # OSD available.

# Add osd.4 to ceph.conf on all Ceph nodes.
vi /etc/ceph/ceph.conf
...
[osd.4]
public addr = 172.1.3.1
cluster addr = 10.1.3.1
...

# Now add the mount point.
mkdir -p /var/lib/ceph/osd/ceph-4
chown -R ceph:ceph /var/lib/ceph/osd/ceph-4

# The command below mounts everything in fstab.
mount -a
# The number after -i below needs changed to the correct OSD ID, and the osd-uuid needs to be changed the UUID created with uuidgen above.
Your keyring location may be different and need changed as well.
ceph-osd -i 4 --mkfs --mkkey --osd-uuid
23e734d7-96d8-4327-a2b9-0fbdc72ed8f1
chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd
'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring

# Add the new OSD to its host in the crush map.
ceph osd crush add osd.4 .0 host=YOURhostNAME

# Since the weight used in the previous step was .0, you will need to increase it. I use 1 for a 1TB drive and 5 for a 5TB drive. The command below will reweight osd.4 to 1. You may need to slowly ramp up this number. ie .10 then .20 etc.
ceph osd crush reweight osd.4 1

You should now be able to start the drive. You can watch the data move to the drive with a ceph -w. Once data has migrated to the drive, start the next.

Cary
-Dynamic

On Thu, Dec 14, 2017 at 5:34 PM, James Okken <James.Okken@xxxxxxxxxxxx> wrote:
Hi all,

Please let me know if I am missing steps or using the wrong steps

I'm hoping to expand my small CEPH cluster by adding 4TB hard drives to each of the 3 servers in the cluster.

I also need to change my replication factor from 1 to 3.
This is part of an Openstack environment deployed by Fuel and I had foolishly set my replication factor to 1 in the Fuel settings before deploy. I know this would have been done better at the beginning. I do want to keep the current cluster and not start over. I know this is going thrash my cluster for a while replicating, but there isn't too much data on it yet.

To start I need to safely turn off each CEPH server and add in the 4TB drive:
To do that I am going to run:
ceph osd set noout
systemctl stop ceph-osd@1 (or 2 or 3 on the other servers) ceph osd
tree (to verify it is down) poweroff, install the 4TB drive, bootup
again ceph osd unset noout

Next step wouyld be to get CEPH to use the 4TB drives. Each CEPH server already has a 836GB OSD.

ceph> osd df
ID WEIGHT  REWEIGHT SIZE  USE  AVAIL %USE  VAR  PGS
  0 0.81689  1.00000  836G 101G  734G 12.16 0.90 167
  1 0.81689  1.00000  836G 115G  721G 13.76 1.02 166
  2 0.81689  1.00000  836G 121G  715G 14.49 1.08 179
               TOTAL 2509G 338G 2171G 13.47 MIN/MAX VAR: 0.90/1.08
STDDEV: 0.97

ceph> df
GLOBAL:
     SIZE      AVAIL     RAW USED     %RAW USED
     2509G     2171G         338G         13.47
POOLS:
     NAME        ID     USED     %USED     MAX AVAIL     OBJECTS
     rbd         0         0         0         2145G           0
     images      1      216G      9.15         2145G       27745
     backups     2         0         0         2145G           0
     volumes     3      114G      5.07         2145G       29717
     compute     4         0         0         2145G           0

Once I get the 4TB drive into each CEPH server should I look to increasing the current OSD (ie: to 4836GB)?
Or create a second 4000GB OSD on each CEPH server?
If I am going to create a second OSD on each CEPH server I hope to use this doc:
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/

As far as changing the replication factor from 1 to 3:
Here are my pools now:

ceph osd pool ls detail
pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0 pool 1 'images' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 116 flags hashpspool stripe_width 0
         removed_snaps [1~3,b~6,12~8,20~2,24~6,2b~8,34~2,37~20]
pool 2 'backups' replicated size 1 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 64 pgp_num 64 last_change 7 flags hashpspool stripe_width 0 pool 3 'volumes' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 73 flags hashpspool stripe_width 0
         removed_snaps [1~3]
pool 4 'compute' replicated size 1 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 64 pgp_num 64 last_change 34 flags
hashpspool stripe_width 0

I plan on using these steps I saw online:
ceph osd pool set rbd size 3
ceph -s  (Verify that replication completes successfully) ceph osd
pool set images size 3 ceph -s ceph osd pool set backups size 3 ceph
-s ceph osd pool set volumes size 3 ceph -s

please let me know any advice or better methods...

thanks

--Jim

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com