Re: add hard drives to 3 CEPH servers (3 server cluster)

David Turner <drakonstein@xxxxxxxxx> · Fri, 15 Dec 2017 22:12:22 +0000

In conjunction with increasing the pool size to 3, also increase the pool min_size to 2.  `ceph df` and `ceph osd df` will eventually show the full size in use in your cluster.  In particular the output of `ceph df` with available size in a pool takes into account the pools replication size.  Continue watching ceph -s or ceph -w to see when the backfilling for your change to replication size finishes.

On Fri, Dec 15, 2017 at 5:06 PM James Okken <James.Okken@xxxxxxxxxxxx> wrote:
This whole effort went extremely well, thanks to Cary, and Im not used to that with CEPH so far. (And openstack ever....)

Thank you Cary.

Ive upped the replication factor and now I see "replicated size 3" in each of my pools. Is this the only place to check replication level? Is there a Global setting or only a setting per Pool?

ceph osd pool ls detail

pool 0 'rbd' replicated size 3......

pool 1 'images' replicated size 3...

...

One last question!

At this replication level how can I tell how much total space I actually have now?

Do I just 1/3 the Global size?

ceph df

GLOBAL:

    SIZE       AVAIL      RAW USED     %RAW USED

    13680G     12998G         682G          4.99

POOLS:

    NAME        ID     USED     %USED     MAX AVAIL     OBJECTS

    rbd         0         0         0         6448G           0

    images      1      216G      3.24         6448G       27745

    backups     2         0         0         6448G           0

    volumes     3      117G      1.79         6448G       30441

    compute     4         0         0         6448G           0

ceph osd df

ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE VAR  PGS

 0 0.81689  1.00000   836G 36549M   800G 4.27 0.86  67

 4 3.70000  1.00000  3723G   170G  3553G 4.58 0.92 270

 1 0.81689  1.00000   836G 49612M   788G 5.79 1.16  56

 5 3.70000  1.00000  3723G   192G  3531G 5.17 1.04 282

 2 0.81689  1.00000   836G 33639M   803G 3.93 0.79  58

 3 3.70000  1.00000  3723G   202G  3521G 5.43 1.09 291

              TOTAL 13680G   682G 12998G 4.99

MIN/MAX VAR: 0.79/1.16  STDDEV: 0.67

Thanks!

-----Original Message-----

From: Cary [mailto:dynamic.cary@xxxxxxxxx]

Sent: Friday, December 15, 2017 4:05 PM

To: James Okken

Cc: ceph-users@xxxxxxxxxxxxxx

Subject: Re:  add hard drives to 3 CEPH servers (3 server cluster)

James,

 Those errors are normal. Ceph creates the missing files. You can check "/var/lib/ceph/osd/ceph-6", before and after you run those commands to see what files are added there.

 Make sure you get the replication factor set.

Cary

-Dynamic

On Fri, Dec 15, 2017 at 6:11 PM, James Okken <James.Okken@xxxxxxxxxxxx> wrote:

> Thanks again Cary,

>

> Yes, once all the backfilling was done I was back to a Healthy cluster.

> I moved on to the same steps for the next server in the cluster, it is backfilling now.

> Once that is done I will do the last server in the cluster, and then I think I am done!

>

> Just checking on one thing. I get these messages when running this command. I assume this is OK, right?

> root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid

> 25c21708-f756-4593-bc9e-c5506622cf07

> 2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open:

> disabling aio for non-block journal.  Use journal_force_aio to force

> use of aio anyway

> 2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open:

> disabling aio for non-block journal.  Use journal_force_aio to force

> use of aio anyway

> 2017-12-15 17:28:22.856444 7fd2f9e928c0 -1

> filestore(/var/lib/ceph/osd/ceph-4) could not find

> #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or

> directory

> 2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store

> /var/lib/ceph/osd/ceph-4 for osd.4 fsid

> 2b9f7957-d0db-481e-923e-89972f6c594f

> 2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file:

> /var/lib/ceph/osd/ceph-4/keyring: can't open

> /var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory

> 2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring

> /var/lib/ceph/osd/ceph-4/keyring

>

> thanks

>

> -----Original Message-----

> From: Cary [mailto:dynamic.cary@xxxxxxxxx]

> Sent: Thursday, December 14, 2017 7:13 PM

> To: James Okken

> Cc: ceph-users@xxxxxxxxxxxxxx

> Subject: Re:  add hard drives to 3 CEPH servers (3 server

> cluster)

>

> James,

>

>  Usually once the misplaced data has balanced out the cluster should reach a healthy state. If you run a "ceph health detail" Ceph will show you some more detail about what is happening.  Is Ceph still recovering, or has it stalled? has the "objects misplaced (62.511%"

> changed to a lower %?

>

> Cary

> -Dynamic

>

> On Thu, Dec 14, 2017 at 10:52 PM, James Okken <James.Okken@xxxxxxxxxxxx> wrote:

>> Thanks Cary!

>>

>> Your directions worked on my first sever. (once I found the missing carriage return in your list of commands, the email musta messed it up.

>>

>> For anyone else:

>> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd

>> 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring really is 2 commands:

>> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4  and ceph auth add osd.4

>> osd 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring

>>

>> Cary, what am I looking for in ceph -w and ceph -s to show the status of the data moving?

>> Seems like the data is moving and that I have some issue...

>>

>> root@node-53:~# ceph -w

>>     cluster 2b9f7957-d0db-481e-923e-89972f6c594f

>>      health HEALTH_WARN

>>             176 pgs backfill_wait

>>             1 pgs backfilling

>>             27 pgs degraded

>>             1 pgs recovering

>>             26 pgs recovery_wait

>>             27 pgs stuck degraded

>>             204 pgs stuck unclean

>>             recovery 10322/84644 objects degraded (12.195%)

>>             recovery 52912/84644 objects misplaced (62.511%)

>>      monmap e3: 3 mons at {node-43=192.168.1.7:6789/0,node-44=192.168.1.5:6789/0,node-45=192.168.1.3:6789/0}

>>             election epoch 138, quorum 0,1,2 node-45,node-44,node-43

>>      osdmap e206: 4 osds: 4 up, 4 in; 177 remapped pgs

>>             flags sortbitwise,require_jewel_osds

>>       pgmap v3936175: 512 pgs, 5 pools, 333 GB data, 58184 objects

>>             370 GB used, 5862 GB / 6233 GB avail

>>             10322/84644 objects degraded (12.195%)

>>             52912/84644 objects misplaced (62.511%)

>>                  308 active+clean

>>                  176 active+remapped+wait_backfill

>>                   26 active+recovery_wait+degraded

>>                    1 active+remapped+backfilling

>>                    1 active+recovering+degraded recovery io 100605

>> kB/s, 14 objects/s

>>   client io 0 B/s rd, 92788 B/s wr, 50 op/s rd, 11 op/s wr

>>

>> 2017-12-14 22:45:57.459846 mon.0 [INF] pgmap v3936174: 512 pgs: 1

>> activating, 1 active+recovering+degraded, 26

>> active+recovery_wait+degraded, 1 active+remapped+backfilling, 307

>> active+clean, 176 active+remapped+wait_backfill; 333 GB data, 369 GB

>> used, 5863 GB / 6233 GB avail; 0 B/s rd, 101107 B/s wr, 19 op/s;

>> 10354/84644 objects degraded (12.232%); 52912/84644 objects misplaced

>> (62.511%); 12224 kB/s, 2 objects/s recovering

>> 2017-12-14 22:45:58.466736 mon.0 [INF] pgmap v3936175: 512 pgs: 1

>> active+recovering+degraded, 26 active+recovery_wait+degraded, 1

>> active+remapped+backfilling, 308 active+clean, 176 wait_backfill; 333

>> active+remapped+GB data, 370 GB used, 5862 GB /

>> 6233 GB avail; 0 B/s rd, 92788 B/s wr, 61 op/s; 10322/84644 objects

>> degraded (12.195%); 52912/84644 objects misplaced (62.511%); 100605

>> kB/s, 14 objects/s recovering

>> 2017-12-14 22:46:00.474335 mon.0 [INF] pgmap v3936176: 512 pgs: 1

>> active+recovering+degraded, 26 active+recovery_wait+degraded, 1

>> active+remapped+backfilling, 308 active+clean, 176 wait_backfill; 333

>> active+remapped+GB data, 370 GB used, 5862 GB /

>> 6233 GB avail; 0 B/s rd, 434 kB/s wr, 45 op/s; 10322/84644 objects

>> degraded (12.195%); 52912/84644 objects misplaced (62.511%); 84234

>> kB/s, 10 objects/s recovering

>> 2017-12-14 22:46:02.482228 mon.0 [INF] pgmap v3936177: 512 pgs: 1

>> active+recovering+degraded, 26 active+recovery_wait+degraded, 1

>> active+remapped+backfilling, 308 active+clean, 176 wait_backfill; 333

>> active+remapped+GB data, 370 GB used, 5862 GB /

>> 6233 GB avail; 0 B/s rd, 334 kB/s wr

>>

>>

>> -----Original Message-----

>> From: Cary [mailto:dynamic.cary@xxxxxxxxx]

>> Sent: Thursday, December 14, 2017 4:21 PM

>> To: James Okken

>> Cc: ceph-users@xxxxxxxxxxxxxx

>> Subject: Re:  add hard drives to 3 CEPH servers (3 server

>> cluster)

>>

>> Jim,

>>

>> I am not an expert, but I believe I can assist.

>>

>>  Normally you will only have 1 OSD per drive. I have heard discussions about using multiple OSDs per disk, when using SSDs though.

>>

>>  Once your drives have been installed you will have to format them, unless you are using Bluestore. My steps for formatting are below.

>> Replace the sXX with your drive name.

>>

>> parted -a optimal /dev/sXX

>> print

>> mklabel gpt

>> unit mib

>> mkpart OSD4sdd1 1 -1

>> quit

>> mkfs.xfs -f /dev/sXX1

>>

>> # Run blkid, and copy the UUID for the newly formatted drive.

>> blkid

>> # Add the mount point/UUID to fstab. The mount point will be created later.

>> vi /etc/fstab

>> # For example

>> UUID=6386bac4-7fef-3cd2-7d64-13db51d83b12 /var/lib/ceph/osd/ceph-4

>> xfs

>> rw,noatime,inode64,logbufs=8 0 0

>>

>>

>> # You can then add the OSD to the cluster.

>>

>> uuidgen

>> # Replace the UUID below with the UUID that was created with uuidgen.

>> ceph osd create 23e734d7-96d8-4327-a2b9-0fbdc72ed8f1

>>

>> # Notice what number of osd it creates usually the lowest # OSD available.

>>

>> # Add osd.4 to ceph.conf on all Ceph nodes.

>> vi /etc/ceph/ceph.conf

>> ...

>> [osd.4]

>> public addr = 172.1.3.1

>> cluster addr = 10.1.3.1

>> ...

>>

>> # Now add the mount point.

>> mkdir -p /var/lib/ceph/osd/ceph-4

>> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4

>>

>> # The command below mounts everything in fstab.

>> mount -a

>> # The number after -i below needs changed to the correct OSD ID, and the osd-uuid needs to be changed the UUID created with uuidgen above.

>> Your keyring location may be different and need changed as well.

>> ceph-osd -i 4 --mkfs --mkkey --osd-uuid

>> 23e734d7-96d8-4327-a2b9-0fbdc72ed8f1

>> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd

>> 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring

>>

>> # Add the new OSD to its host in the crush map.

>> ceph osd crush add osd.4 .0 host=YOURhostNAME

>>

>> # Since the weight used in the previous step was .0, you will need to increase it. I use 1 for a 1TB drive and 5 for a 5TB drive. The command below will reweight osd.4 to 1. You may need to slowly ramp up this number. ie .10 then .20 etc.

>> ceph osd crush reweight osd.4 1

>>

>> You should now be able to start the drive. You can watch the data move to the drive with a ceph -w. Once data has migrated to the drive, start the next.

>>

>> Cary

>> -Dynamic

>>

>> On Thu, Dec 14, 2017 at 5:34 PM, James Okken <James.Okken@xxxxxxxxxxxx> wrote:

>>> Hi all,

>>>

>>> Please let me know if I am missing steps or using the wrong steps

>>>

>>> I'm hoping to expand my small CEPH cluster by adding 4TB hard drives to each of the 3 servers in the cluster.

>>>

>>> I also need to change my replication factor from 1 to 3.

>>> This is part of an Openstack environment deployed by Fuel and I had foolishly set my replication factor to 1 in the Fuel settings before deploy. I know this would have been done better at the beginning. I do want to keep the current cluster and not start over. I know this is going thrash my cluster for a while replicating, but there isn't too much data on it yet.

>>>

>>>

>>> To start I need to safely turn off each CEPH server and add in the 4TB drive:

>>> To do that I am going to run:

>>> ceph osd set noout

>>> systemctl stop ceph-osd@1 (or 2 or 3 on the other servers) ceph osd

>>> tree (to verify it is down) poweroff, install the 4TB drive, bootup

>>> again ceph osd unset noout

>>>

>>>

>>>

>>> Next step wouyld be to get CEPH to use the 4TB drives. Each CEPH server already has a 836GB OSD.

>>>

>>> ceph> osd df

>>> ID WEIGHT  REWEIGHT SIZE  USE  AVAIL %USE  VAR  PGS

>>>  0 0.81689  1.00000  836G 101G  734G 12.16 0.90 167

>>>  1 0.81689  1.00000  836G 115G  721G 13.76 1.02 166

>>>  2 0.81689  1.00000  836G 121G  715G 14.49 1.08 179

>>>               TOTAL 2509G 338G 2171G 13.47 MIN/MAX VAR: 0.90/1.08

>>> STDDEV: 0.97

>>>

>>> ceph> df

>>> GLOBAL:

>>>     SIZE      AVAIL     RAW USED     %RAW USED

>>>     2509G     2171G         338G         13.47

>>> POOLS:

>>>     NAME        ID     USED     %USED     MAX AVAIL     OBJECTS

>>>     rbd         0         0         0         2145G           0

>>>     images      1      216G      9.15         2145G       27745

>>>     backups     2         0         0         2145G           0

>>>     volumes     3      114G      5.07         2145G       29717

>>>     compute     4         0         0         2145G           0

>>>

>>>

>>> Once I get the 4TB drive into each CEPH server should I look to increasing the current OSD (ie: to 4836GB)?

>>> Or create a second 4000GB OSD on each CEPH server?

>>> If I am going to create a second OSD on each CEPH server I hope to use this doc:

>>> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/

>>>

>>>

>>>

>>> As far as changing the replication factor from 1 to 3:

>>> Here are my pools now:

>>>

>>> ceph osd pool ls detail

>>> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0

>>> object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0 pool 1 'images' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 116 flags hashpspool stripe_width 0

>>>         removed_snaps [1~3,b~6,12~8,20~2,24~6,2b~8,34~2,37~20]

>>> pool 2 'backups' replicated size 1 min_size 1 crush_ruleset 0

>>> object_hash rjenkins pg_num 64 pgp_num 64 last_change 7 flags hashpspool stripe_width 0 pool 3 'volumes' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 73 flags hashpspool stripe_width 0

>>>         removed_snaps [1~3]

>>> pool 4 'compute' replicated size 1 min_size 1 crush_ruleset 0

>>> object_hash rjenkins pg_num 64 pgp_num 64 last_change 34 flags

>>> hashpspool stripe_width 0

>>>

>>> I plan on using these steps I saw online:

>>> ceph osd pool set rbd size 3

>>> ceph -s  (Verify that replication completes successfully) ceph osd

>>> pool set images size 3 ceph -s ceph osd pool set backups size 3 ceph

>>> -s ceph osd pool set volumes size 3 ceph -s

>>>

>>>

>>> please let me know any advice or better methods...

>>>

>>> thanks

>>>

>>> --Jim

>>>

>>> _______________________________________________

>>> ceph-users mailing list

>>> ceph-users@xxxxxxxxxxxxxx

>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com