Re: CRUSH Map Adjustment for Node Replication

Christian Balzer <chibi@xxxxxxx> · Tue, 24 Mar 2015 08:53:38 +0900

Georgios,

it really depends on how busy and powerful your cluster is, as Robert
wrote.
If in doubt, lower the backfill value as pointed out by Robert. 
Look at the osd_scrub_load_threshold and with new enough
versions of Ceph at the osd_scrub_sleep setting, this is very helpful in keeping
deep scrubs from making he cluster excessively sluggish.

Then make the CRUSH change at a time when your cluster is least busy
(weekend nights for many people). 
Wait until the data movement has finished.
After that (maybe the next night) deep scrub all your OSDs, either
sequentially (less impact):

"ceph osd deep-scrub 0" ...

or if your cluster is fast enough all at once:

"ceph osd deep-scrub \*"

My current clusters are fast enough to do this within a few hours, so
basically once you've kicked a deep scrub off at the correct time, it will
happen (with default settings) again a week later, thus (in my case at
least) never having a deep scrub during business hours.

Of course people with really large clusters tend to have enough reserves
that deep scrubs (and rebuilds/backfills due to failed OSDs) during peak
times are not an issue at all (looking at Dan over at CERN ^o^).

Christian

On Mon, 23 Mar 2015 17:24:25 -0600 Robert LeBlanc wrote:

> I don't believe that you can set the schedule of the deep scrubs.
> People that want that kind of control disable deep scrubs and run a
> script to scrub all PGs. For the other options, you should look
> through http://ceph.com/docs/master/rados/configuration/osd-config-ref/
> and find what you feel might be most important to you. We mess with
> "osd max backfills". You may want to look at "osd recovery max
> active", "osd recovery op priority" to name a few. You can adjust the
> idle load of the cluster to perform deep scrubs, etc.
> 
> On Mon, Mar 23, 2015 at 5:10 PM, Dimitrakakis Georgios
> <giorgis@xxxxxxxxxxxx> wrote:
> > Robert thanks for the info!
> >
> > How can I find out and modify when is scheduled the next deep scrub,
> > the number of backfill processes and their priority?
> >
> > Best regards,
> >
> > George
> >
> >
> >
> > ---- Ο χρήστης Robert LeBlanc έγραψε ----
> >
> >
> > You just need to change your rule from
> >
> > step chooseleaf firstn 0 type osd
> >
> > to
> >
> > step chooseleaf firstn 0 type host
> >
> > There will be data movement as it will want to move about half the
> > objects to the new host. There will be data generation as you move
> > from size 1 to size 2. As far as I know a deep scrub won't happen
> > until the next scheduled time. The time to do all of this is dependent
> > on your disk speed, network speed, CPU and RAM capacity as well as the
> > number of backfill processes configured, the priority of the backfill
> > process, how active your disks are and how much data you have stored
> > in the cluster. In short ... it depends.
> >
> > On Mon, Mar 23, 2015 at 4:30 PM, Georgios Dimitrakakis
> > <giorgis@xxxxxxxxxxxx> wrote:
> >> Hi all!
> >>
> >> I had a CEPH Cluster with 10x OSDs all of them in one node.
> >>
> >> Since the cluster was built from the beginning with just one OSDs
> >> node the crushmap had as a default
> >> the replication to be on OSDs.
> >>
> >> Here is the relevant part from my crushmap:
> >>
> >>
> >> # rules
> >> rule replicated_ruleset {
> >>         ruleset 0
> >>         type replicated
> >>         min_size 1
> >>         max_size 10
> >>         step take default
> >>         step chooseleaf firstn 0 type osd
> >>         step emit
> >> }
> >>
> >> # end crush map
> >>
> >>
> >> I have added a new node with 10x more identical OSDs thus the total
> >> OSDs nodes are now two.
> >>
> >> I have changed the replication factor to be 2 on all pools and I would
> >> like
> >> to make sure that
> >> I always keep each copy on a different node.
> >>
> >> In order to do so do I have to change the CRUSH map?
> >>
> >> Which part should I change?
> >>
> >>
> >> After modifying the CRUSH map what procedure will take place before
> >> the cluster is ready again?
> >>
> >> Is it going to start re-balancing and moving data around? Will a
> >> deep-scrub
> >> follow?
> >>
> >> Does the time of the procedure depends on anything else except the
> >> amount of
> >> data and the available connection (bandwidth)?
> >>
> >>
> >> Looking forward for your answers!
> >>
> >>
> >> All the best,
> >>
> >>
> >> George
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com