Re: Maximizing OSD to PG quantity

David Turner <david.turner@xxxxxxxxxxxxxxxx> · Wed, 6 Apr 2016 18:15:57 +0000

You can mitigate how much it affects the IO but for the cost of how long it will take to complete.

ceph tell osd.* injectargs '--osd-max-backfills #'

Where # is the most pgs any osd can participate backfill data for at any given time.  This is the same setting that is used when you add, remove, lose, or reweight osds in your cluster.  The lower the number, the less impact to cluster IO but the longer it will take to finish the task.  Max-backfills of 5 seems to work out well enough to get through things in a timely manner while not critically impacting IO.  I do up that to 20 if I need speed more than IO.  These numbers are very dependent on your individual hardware and configuration.
________________________________________
From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Oliver Dzombic [info@xxxxxxxxxxxxxxxxx]
Sent: Wednesday, April 06, 2016 11:45 AM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Maximizing OSD to PG quantity

Hi,

huge, deadly, IO :-)

Imagine, everything has to multiplied 1 time. Thats nothing what will go
smooth :-)

--
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107

Am 06.04.2016 um 16:41 schrieb dan@xxxxxxxxxxxxxxxxx:
> Will changing the replication size from 2 to 3 cause huge I/O resources
> to be used, or does this happen quietly in the background?
>
>
> On 2016-04-06 00:40, Christian Balzer wrote:
>> Hello,
>>
>> Brian already mentioned a number very pertinent things, I've got a few
>> more:
>>
>> On Tue, 05 Apr 2016 10:48:49 -0400 dan@xxxxxxxxxxxxxxxxx wrote:
>>
>>> In a 12 OSD setup, the following config is there:
>>>
>>>             (OSDs * 100)
>>> Total PGs = ----------
>>>               pool size
>>>
>>
>> The PGcalc page at http://ceph.com/pgcalc/ is quite helpful and
>> contains a
>> lot of background info as well.
>>
>> As Brian said, you can never decrease PG count, but growing it is also a
>> very I/O intensive operation and you want to avoid that as much as
>> possible.
>>
>>>
>>> So with 12 OSD's and a pool size of 2 replicas, this would equal Total
>>> PGs of 600 as per this url:
>> PGcalc with a target of 200 PGs per OSD (doubling of cluster size
>> expected) gives us 1024, which is also what I would go for myself.
>>
>> However if this a production cluster and your OSDs are NOT RAID1 or very
>> very reliable, fast and well monitored SSDs you're basically asking
>> Murphy
>> to come visit, destroying your data while eating babies and washing them
>> down with bath water.
>>
>> The default replication size was changed to 3 for a very good reason,
>> there are plenty of threads in this ML about failure scenarios and
>> probabilities.
>>
>> Christian
>>
>>>
>>> http://docs.ceph.com/docs/master/rados/operations/placement-groups/#preselection
>>>
>>>
>>> Yet in the same page, at the top it says:
>>>
>>> Between 10 and 50 OSDs set pg_num to 4096
>>>
>>> Our use is for shared hosting so there are lots of small writes and
>>> reads.  Which of these would be correct?
>>>
>>> Also is it a simple process to update PGs on a live system without
>>> affecting service?
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com