Re: Maximizing OSD to PG quantity

Christian Balzer <chibi@xxxxxxx> · Thu, 7 Apr 2016 09:10:41 +0900

Hello,

On Wed, 6 Apr 2016 18:15:57 +0000 David Turner wrote:

> You can mitigate how much it affects the IO but for the cost of how long
> it will take to complete.
> 
> ceph tell osd.* injectargs '--osd-max-backfills #'
> 
Also have a read of:
https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg27970.html
for more knobs to twiddle.

> Where # is the most pgs any osd can participate backfill data for at any
> given time.  This is the same setting that is used when you add, remove,
> lose, or reweight osds in your cluster.  The lower the number, the less
> impact to cluster IO but the longer it will take to finish the task.
> Max-backfills of 5 seems to work out well enough to get through things
> in a timely manner while not critically impacting IO.  I do up that to
> 20 if I need speed more than IO.  These numbers are very dependent on
> your individual hardware and configuration.
>
Very very, true words.

Which brings me to the OP, you haven't told us your cluster details.
12 OSDs sounds like 2 hosts with 6 OSDs each to me.
If that's the case, you'll need/want a 3rd host. 

If you already have 3 or more storage nodes, you can go ahead with the
replica increase, but note that this will not only reduce your storage
capacity accordingly but also have an impact on performance, one more OSD
will have to ACK each write. This will be particular noticeable with
non-SSD journals, but the additional network latency will be there in any
case.

Christian

> ________________________________________ From: ceph-users
> [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Oliver Dzombic
> [info@xxxxxxxxxxxxxxxxx] Sent: Wednesday, April 06, 2016 11:45 AM To:
> ceph-users@xxxxxxxxxxxxxx Subject: Re:  Maximizing OSD to PG
> quantity
> 
> Hi,
> 
> huge, deadly, IO :-)
> 
> Imagine, everything has to multiplied 1 time. Thats nothing what will go
> smooth :-)
> 
> --
> Mit freundlichen Gruessen / Best regards
> 
> Oliver Dzombic
> IP-Interactive
> 
> mailto:info@xxxxxxxxxxxxxxxxx
> 
> Anschrift:
> 
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
> 
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
> 
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
> 
> 
> Am 06.04.2016 um 16:41 schrieb dan@xxxxxxxxxxxxxxxxx:
> > Will changing the replication size from 2 to 3 cause huge I/O resources
> > to be used, or does this happen quietly in the background?
> >
> >
> > On 2016-04-06 00:40, Christian Balzer wrote:
> >> Hello,
> >>
> >> Brian already mentioned a number very pertinent things, I've got a few
> >> more:
> >>
> >> On Tue, 05 Apr 2016 10:48:49 -0400 dan@xxxxxxxxxxxxxxxxx wrote:
> >>
> >>> In a 12 OSD setup, the following config is there:
> >>>
> >>>             (OSDs * 100)
> >>> Total PGs = ----------
> >>>               pool size
> >>>
> >>
> >> The PGcalc page at http://ceph.com/pgcalc/ is quite helpful and
> >> contains a
> >> lot of background info as well.
> >>
> >> As Brian said, you can never decrease PG count, but growing it is
> >> also a very I/O intensive operation and you want to avoid that as
> >> much as possible.
> >>
> >>>
> >>> So with 12 OSD's and a pool size of 2 replicas, this would equal
> >>> Total PGs of 600 as per this url:
> >> PGcalc with a target of 200 PGs per OSD (doubling of cluster size
> >> expected) gives us 1024, which is also what I would go for myself.
> >>
> >> However if this a production cluster and your OSDs are NOT RAID1 or
> >> very very reliable, fast and well monitored SSDs you're basically
> >> asking Murphy
> >> to come visit, destroying your data while eating babies and washing
> >> them down with bath water.
> >>
> >> The default replication size was changed to 3 for a very good reason,
> >> there are plenty of threads in this ML about failure scenarios and
> >> probabilities.
> >>
> >> Christian
> >>
> >>>
> >>> http://docs.ceph.com/docs/master/rados/operations/placement-groups/#preselection
> >>>
> >>>
> >>> Yet in the same page, at the top it says:
> >>>
> >>> Between 10 and 50 OSDs set pg_num to 4096
> >>>
> >>> Our use is for shared hosting so there are lots of small writes and
> >>> reads.  Which of these would be correct?
> >>>
> >>> Also is it a simple process to update PGs on a live system without
> >>> affecting service?
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@xxxxxxxxxxxxxx
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com