Re: add hard drives to 3 CEPH servers (3 server cluster)

Cary <dynamic.cary@xxxxxxxxx> · Fri, 15 Dec 2017 22:55:51 +0000

James,

You can set these values in ceph.conf.

[global]
...
osd pool default size         = 3
osd pool default min size  = 2
...

New pools that are created will use those values.

If you run a "ceph -s"  and look at the "usage" line, it shows how
much space is: 1 used, 2 available, 3 total. ie.

usage:   19465 GB used, 60113 GB / 79578 GB avail

We choose to use Openstack with Ceph in this decade and do the other
things, not because they are easy, but because they are hard...;-p

Cary
-Dynamic

On Fri, Dec 15, 2017 at 10:12 PM, David Turner <drakonstein@xxxxxxxxx> wrote:
> In conjunction with increasing the pool size to 3, also increase the pool
> min_size to 2.  `ceph df` and `ceph osd df` will eventually show the full
> size in use in your cluster.  In particular the output of `ceph df` with
> available size in a pool takes into account the pools replication size.
> Continue watching ceph -s or ceph -w to see when the backfilling for your
> change to replication size finishes.
>
> On Fri, Dec 15, 2017 at 5:06 PM James Okken <James.Okken@xxxxxxxxxxxx>
> wrote:
>>
>> This whole effort went extremely well, thanks to Cary, and Im not used to
>> that with CEPH so far. (And openstack ever....)
>> Thank you Cary.
>>
>> Ive upped the replication factor and now I see "replicated size 3" in each
>> of my pools. Is this the only place to check replication level? Is there a
>> Global setting or only a setting per Pool?
>>
>> ceph osd pool ls detail
>> pool 0 'rbd' replicated size 3......
>> pool 1 'images' replicated size 3...
>> ...
>>
>> One last question!
>> At this replication level how can I tell how much total space I actually
>> have now?
>> Do I just 1/3 the Global size?
>>
>> ceph df
>> GLOBAL:
>>     SIZE       AVAIL      RAW USED     %RAW USED
>>     13680G     12998G         682G          4.99
>> POOLS:
>>     NAME        ID     USED     %USED     MAX AVAIL     OBJECTS
>>     rbd         0         0         0         6448G           0
>>     images      1      216G      3.24         6448G       27745
>>     backups     2         0         0         6448G           0
>>     volumes     3      117G      1.79         6448G       30441
>>     compute     4         0         0         6448G           0
>>
>> ceph osd df
>> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE VAR  PGS
>>  0 0.81689  1.00000   836G 36549M   800G 4.27 0.86  67
>>  4 3.70000  1.00000  3723G   170G  3553G 4.58 0.92 270
>>  1 0.81689  1.00000   836G 49612M   788G 5.79 1.16  56
>>  5 3.70000  1.00000  3723G   192G  3531G 5.17 1.04 282
>>  2 0.81689  1.00000   836G 33639M   803G 3.93 0.79  58
>>  3 3.70000  1.00000  3723G   202G  3521G 5.43 1.09 291
>>               TOTAL 13680G   682G 12998G 4.99
>> MIN/MAX VAR: 0.79/1.16  STDDEV: 0.67
>>
>> Thanks!
>>
>> -----Original Message-----
>> From: Cary [mailto:dynamic.cary@xxxxxxxxx]
>> Sent: Friday, December 15, 2017 4:05 PM
>> To: James Okken
>> Cc: ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  add hard drives to 3 CEPH servers (3 server
>> cluster)
>>
>> James,
>>
>>  Those errors are normal. Ceph creates the missing files. You can check
>> "/var/lib/ceph/osd/ceph-6", before and after you run those commands to see
>> what files are added there.
>>
>>  Make sure you get the replication factor set.
>>
>>
>> Cary
>> -Dynamic
>>
>> On Fri, Dec 15, 2017 at 6:11 PM, James Okken <James.Okken@xxxxxxxxxxxx>
>> wrote:
>> > Thanks again Cary,
>> >
>> > Yes, once all the backfilling was done I was back to a Healthy cluster.
>> > I moved on to the same steps for the next server in the cluster, it is
>> > backfilling now.
>> > Once that is done I will do the last server in the cluster, and then I
>> > think I am done!
>> >
>> > Just checking on one thing. I get these messages when running this
>> > command. I assume this is OK, right?
>> > root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid
>> > 25c21708-f756-4593-bc9e-c5506622cf07
>> > 2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open:
>> > disabling aio for non-block journal.  Use journal_force_aio to force
>> > use of aio anyway
>> > 2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open:
>> > disabling aio for non-block journal.  Use journal_force_aio to force
>> > use of aio anyway
>> > 2017-12-15 17:28:22.856444 7fd2f9e928c0 -1
>> > filestore(/var/lib/ceph/osd/ceph-4) could not find
>> > #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or
>> > directory
>> > 2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store
>> > /var/lib/ceph/osd/ceph-4 for osd.4 fsid
>> > 2b9f7957-d0db-481e-923e-89972f6c594f
>> > 2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file:
>> > /var/lib/ceph/osd/ceph-4/keyring: can't open
>> > /var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory
>> > 2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring
>> > /var/lib/ceph/osd/ceph-4/keyring
>> >
>> > thanks
>> >
>> > -----Original Message-----
>> > From: Cary [mailto:dynamic.cary@xxxxxxxxx]
>> > Sent: Thursday, December 14, 2017 7:13 PM
>> > To: James Okken
>> > Cc: ceph-users@xxxxxxxxxxxxxx
>> > Subject: Re:  add hard drives to 3 CEPH servers (3 server
>> > cluster)
>> >
>> > James,
>> >
>> >  Usually once the misplaced data has balanced out the cluster should
>> > reach a healthy state. If you run a "ceph health detail" Ceph will show you
>> > some more detail about what is happening.  Is Ceph still recovering, or has
>> > it stalled? has the "objects misplaced (62.511%"
>> > changed to a lower %?
>> >
>> > Cary
>> > -Dynamic
>> >
>> > On Thu, Dec 14, 2017 at 10:52 PM, James Okken <James.Okken@xxxxxxxxxxxx>
>> > wrote:
>> >> Thanks Cary!
>> >>
>> >> Your directions worked on my first sever. (once I found the missing
>> >> carriage return in your list of commands, the email musta messed it up.
>> >>
>> >> For anyone else:
>> >> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd
>> >> 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring
>> >> really is 2 commands:
>> >> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4  and ceph auth add osd.4
>> >> osd 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring
>> >>
>> >> Cary, what am I looking for in ceph -w and ceph -s to show the status
>> >> of the data moving?
>> >> Seems like the data is moving and that I have some issue...
>> >>
>> >> root@node-53:~# ceph -w
>> >>     cluster 2b9f7957-d0db-481e-923e-89972f6c594f
>> >>      health HEALTH_WARN
>> >>             176 pgs backfill_wait
>> >>             1 pgs backfilling
>> >>             27 pgs degraded
>> >>             1 pgs recovering
>> >>             26 pgs recovery_wait
>> >>             27 pgs stuck degraded
>> >>             204 pgs stuck unclean
>> >>             recovery 10322/84644 objects degraded (12.195%)
>> >>             recovery 52912/84644 objects misplaced (62.511%)
>> >>      monmap e3: 3 mons at
>> >> {node-43=192.168.1.7:6789/0,node-44=192.168.1.5:6789/0,node-45=192.168.1.3:6789/0}
>> >>             election epoch 138, quorum 0,1,2 node-45,node-44,node-43
>> >>      osdmap e206: 4 osds: 4 up, 4 in; 177 remapped pgs
>> >>             flags sortbitwise,require_jewel_osds
>> >>       pgmap v3936175: 512 pgs, 5 pools, 333 GB data, 58184 objects
>> >>             370 GB used, 5862 GB / 6233 GB avail
>> >>             10322/84644 objects degraded (12.195%)
>> >>             52912/84644 objects misplaced (62.511%)
>> >>                  308 active+clean
>> >>                  176 active+remapped+wait_backfill
>> >>                   26 active+recovery_wait+degraded
>> >>                    1 active+remapped+backfilling
>> >>                    1 active+recovering+degraded recovery io 100605
>> >> kB/s, 14 objects/s
>> >>   client io 0 B/s rd, 92788 B/s wr, 50 op/s rd, 11 op/s wr
>> >>
>> >> 2017-12-14 22:45:57.459846 mon.0 [INF] pgmap v3936174: 512 pgs: 1
>> >> activating, 1 active+recovering+degraded, 26
>> >> active+recovery_wait+degraded, 1 active+remapped+backfilling, 307
>> >> active+clean, 176 active+remapped+wait_backfill; 333 GB data, 369 GB
>> >> used, 5863 GB / 6233 GB avail; 0 B/s rd, 101107 B/s wr, 19 op/s;
>> >> 10354/84644 objects degraded (12.232%); 52912/84644 objects misplaced
>> >> (62.511%); 12224 kB/s, 2 objects/s recovering
>> >> 2017-12-14 22:45:58.466736 mon.0 [INF] pgmap v3936175: 512 pgs: 1
>> >> active+recovering+degraded, 26 active+recovery_wait+degraded, 1
>> >> active+remapped+backfilling, 308 active+clean, 176 wait_backfill; 333
>> >> active+remapped+GB data, 370 GB used, 5862 GB /
>> >> 6233 GB avail; 0 B/s rd, 92788 B/s wr, 61 op/s; 10322/84644 objects
>> >> degraded (12.195%); 52912/84644 objects misplaced (62.511%); 100605
>> >> kB/s, 14 objects/s recovering
>> >> 2017-12-14 22:46:00.474335 mon.0 [INF] pgmap v3936176: 512 pgs: 1
>> >> active+recovering+degraded, 26 active+recovery_wait+degraded, 1
>> >> active+remapped+backfilling, 308 active+clean, 176 wait_backfill; 333
>> >> active+remapped+GB data, 370 GB used, 5862 GB /
>> >> 6233 GB avail; 0 B/s rd, 434 kB/s wr, 45 op/s; 10322/84644 objects
>> >> degraded (12.195%); 52912/84644 objects misplaced (62.511%); 84234
>> >> kB/s, 10 objects/s recovering
>> >> 2017-12-14 22:46:02.482228 mon.0 [INF] pgmap v3936177: 512 pgs: 1
>> >> active+recovering+degraded, 26 active+recovery_wait+degraded, 1
>> >> active+remapped+backfilling, 308 active+clean, 176 wait_backfill; 333
>> >> active+remapped+GB data, 370 GB used, 5862 GB /
>> >> 6233 GB avail; 0 B/s rd, 334 kB/s wr
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Cary [mailto:dynamic.cary@xxxxxxxxx]
>> >> Sent: Thursday, December 14, 2017 4:21 PM
>> >> To: James Okken
>> >> Cc: ceph-users@xxxxxxxxxxxxxx
>> >> Subject: Re:  add hard drives to 3 CEPH servers (3 server
>> >> cluster)
>> >>
>> >> Jim,
>> >>
>> >> I am not an expert, but I believe I can assist.
>> >>
>> >>  Normally you will only have 1 OSD per drive. I have heard discussions
>> >> about using multiple OSDs per disk, when using SSDs though.
>> >>
>> >>  Once your drives have been installed you will have to format them,
>> >> unless you are using Bluestore. My steps for formatting are below.
>> >> Replace the sXX with your drive name.
>> >>
>> >> parted -a optimal /dev/sXX
>> >> print
>> >> mklabel gpt
>> >> unit mib
>> >> mkpart OSD4sdd1 1 -1
>> >> quit
>> >> mkfs.xfs -f /dev/sXX1
>> >>
>> >> # Run blkid, and copy the UUID for the newly formatted drive.
>> >> blkid
>> >> # Add the mount point/UUID to fstab. The mount point will be created
>> >> later.
>> >> vi /etc/fstab
>> >> # For example
>> >> UUID=6386bac4-7fef-3cd2-7d64-13db51d83b12 /var/lib/ceph/osd/ceph-4
>> >> xfs
>> >> rw,noatime,inode64,logbufs=8 0 0
>> >>
>> >>
>> >> # You can then add the OSD to the cluster.
>> >>
>> >> uuidgen
>> >> # Replace the UUID below with the UUID that was created with uuidgen.
>> >> ceph osd create 23e734d7-96d8-4327-a2b9-0fbdc72ed8f1
>> >>
>> >> # Notice what number of osd it creates usually the lowest # OSD
>> >> available.
>> >>
>> >> # Add osd.4 to ceph.conf on all Ceph nodes.
>> >> vi /etc/ceph/ceph.conf
>> >> ...
>> >> [osd.4]
>> >> public addr = 172.1.3.1
>> >> cluster addr = 10.1.3.1
>> >> ...
>> >>
>> >> # Now add the mount point.
>> >> mkdir -p /var/lib/ceph/osd/ceph-4
>> >> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4
>> >>
>> >> # The command below mounts everything in fstab.
>> >> mount -a
>> >> # The number after -i below needs changed to the correct OSD ID, and
>> >> the osd-uuid needs to be changed the UUID created with uuidgen above.
>> >> Your keyring location may be different and need changed as well.
>> >> ceph-osd -i 4 --mkfs --mkkey --osd-uuid
>> >> 23e734d7-96d8-4327-a2b9-0fbdc72ed8f1
>> >> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd
>> >> 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring
>> >>
>> >> # Add the new OSD to its host in the crush map.
>> >> ceph osd crush add osd.4 .0 host=YOURhostNAME
>> >>
>> >> # Since the weight used in the previous step was .0, you will need to
>> >> increase it. I use 1 for a 1TB drive and 5 for a 5TB drive. The command
>> >> below will reweight osd.4 to 1. You may need to slowly ramp up this number.
>> >> ie .10 then .20 etc.
>> >> ceph osd crush reweight osd.4 1
>> >>
>> >> You should now be able to start the drive. You can watch the data move
>> >> to the drive with a ceph -w. Once data has migrated to the drive, start the
>> >> next.
>> >>
>> >> Cary
>> >> -Dynamic
>> >>
>> >> On Thu, Dec 14, 2017 at 5:34 PM, James Okken <James.Okken@xxxxxxxxxxxx>
>> >> wrote:
>> >>> Hi all,
>> >>>
>> >>> Please let me know if I am missing steps or using the wrong steps
>> >>>
>> >>> I'm hoping to expand my small CEPH cluster by adding 4TB hard drives
>> >>> to each of the 3 servers in the cluster.
>> >>>
>> >>> I also need to change my replication factor from 1 to 3.
>> >>> This is part of an Openstack environment deployed by Fuel and I had
>> >>> foolishly set my replication factor to 1 in the Fuel settings before deploy.
>> >>> I know this would have been done better at the beginning. I do want to keep
>> >>> the current cluster and not start over. I know this is going thrash my
>> >>> cluster for a while replicating, but there isn't too much data on it yet.
>> >>>
>> >>>
>> >>> To start I need to safely turn off each CEPH server and add in the 4TB
>> >>> drive:
>> >>> To do that I am going to run:
>> >>> ceph osd set noout
>> >>> systemctl stop ceph-osd@1 (or 2 or 3 on the other servers) ceph osd
>> >>> tree (to verify it is down) poweroff, install the 4TB drive, bootup
>> >>> again ceph osd unset noout
>> >>>
>> >>>
>> >>>
>> >>> Next step wouyld be to get CEPH to use the 4TB drives. Each CEPH
>> >>> server already has a 836GB OSD.
>> >>>
>> >>> ceph> osd df
>> >>> ID WEIGHT  REWEIGHT SIZE  USE  AVAIL %USE  VAR  PGS
>> >>>  0 0.81689  1.00000  836G 101G  734G 12.16 0.90 167
>> >>>  1 0.81689  1.00000  836G 115G  721G 13.76 1.02 166
>> >>>  2 0.81689  1.00000  836G 121G  715G 14.49 1.08 179
>> >>>               TOTAL 2509G 338G 2171G 13.47 MIN/MAX VAR: 0.90/1.08
>> >>> STDDEV: 0.97
>> >>>
>> >>> ceph> df
>> >>> GLOBAL:
>> >>>     SIZE      AVAIL     RAW USED     %RAW USED
>> >>>     2509G     2171G         338G         13.47
>> >>> POOLS:
>> >>>     NAME        ID     USED     %USED     MAX AVAIL     OBJECTS
>> >>>     rbd         0         0         0         2145G           0
>> >>>     images      1      216G      9.15         2145G       27745
>> >>>     backups     2         0         0         2145G           0
>> >>>     volumes     3      114G      5.07         2145G       29717
>> >>>     compute     4         0         0         2145G           0
>> >>>
>> >>>
>> >>> Once I get the 4TB drive into each CEPH server should I look to
>> >>> increasing the current OSD (ie: to 4836GB)?
>> >>> Or create a second 4000GB OSD on each CEPH server?
>> >>> If I am going to create a second OSD on each CEPH server I hope to use
>> >>> this doc:
>> >>> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/
>> >>>
>> >>>
>> >>>
>> >>> As far as changing the replication factor from 1 to 3:
>> >>> Here are my pools now:
>> >>>
>> >>> ceph osd pool ls detail
>> >>> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0
>> >>> object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags
>> >>> hashpspool stripe_width 0 pool 1 'images' replicated size 1 min_size 1
>> >>> crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 116
>> >>> flags hashpspool stripe_width 0
>> >>>         removed_snaps [1~3,b~6,12~8,20~2,24~6,2b~8,34~2,37~20]
>> >>> pool 2 'backups' replicated size 1 min_size 1 crush_ruleset 0
>> >>> object_hash rjenkins pg_num 64 pgp_num 64 last_change 7 flags
>> >>> hashpspool stripe_width 0 pool 3 'volumes' replicated size 1 min_size 1
>> >>> crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 73
>> >>> flags hashpspool stripe_width 0
>> >>>         removed_snaps [1~3]
>> >>> pool 4 'compute' replicated size 1 min_size 1 crush_ruleset 0
>> >>> object_hash rjenkins pg_num 64 pgp_num 64 last_change 34 flags
>> >>> hashpspool stripe_width 0
>> >>>
>> >>> I plan on using these steps I saw online:
>> >>> ceph osd pool set rbd size 3
>> >>> ceph -s  (Verify that replication completes successfully) ceph osd
>> >>> pool set images size 3 ceph -s ceph osd pool set backups size 3 ceph
>> >>> -s ceph osd pool set volumes size 3 ceph -s
>> >>>
>> >>>
>> >>> please let me know any advice or better methods...
>> >>>
>> >>> thanks
>> >>>
>> >>> --Jim
>> >>>
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> ceph-users@xxxxxxxxxxxxxx
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com