Re: Adding new OSDs, need to increase PGs?

Mike Dawson <mike.dawson@xxxxxxxxxxxx> · Tue, 03 Dec 2013 10:15:43 -0500

Robert,

Interesting results on the effect of # of PG/PGPs. My cluster struggles 
a bit under the strain of heavy random small-sized writes.

The IOPS you mention seem high to me given 30 drives and 3x replication 
unless they were pure reads or on high-rpm drives. Instead of assuming, 
I want to pose a few questions:

- How are you testing? rados bench, rbd bench, rbd bench with writeback 
cache, etc?

- Were the 2000-2500 random 4k IOPS more reads than writes? If you test 
100% 4k random reads, what do you get? If you test 100% 4k random 
writes, what do you get?

- What drives do you have? Any RAID involved under your OSDs?

Thanks,
Mike Dawson

On 12/3/2013 1:31 AM, Robert van Leeuwen wrote:

On 2 dec. 2013, at 18:26, "Brian Andrus" <brian.andrus@xxxxxxxxxxx> wrote:

  Setting your pg_num and pgp_num to say... 1024 would A) increase data granularity, B) likely lend no noticeable increase to resource consumption, and C) allow some room for future OSDs two be added while still within range of acceptable pg numbers. You could probably safely double even that number if you plan on expanding at a rapid rate and want to avoid splitting PGs every time a node is added.

In general, you can conservatively err on the larger side when it comes to pg/p_num. Any excess resource utilization will be negligible (up to a certain point). If you have a comfortable amount of available RAM, you could experiment with increasing the multiplier in the equation you are using and see how it affects your final number.

The pg_num and pgp_num parameters can safely be changed before or after your new nodes are integrated.

I would be a bit conservative with the PGs / PGPs.
I've experimented with the PG number a bit and noticed the following random IO performance drop.
( this could be something to our specific setup but since the PG is easily increased and impossible to decrease I would be conservative)

  The setup:
3 OSD nodes with 128GB ram, 2 * 6 core CPU (12 with ht).
Nodes have 10 OSDs running on 1 tb disks and 2 SSDs for Journals.

We use a replica count of 3 so optimum according to formula is about 1000
With 1000 PGs I got about 2000-2500 random 4k IOPS.

Because the nodes are fast enough and I expect the cluster to be expanded with 3 more nodes I set the PGs to 2000.
Performance dropped to about 1200-1400 IOPS.

I noticed that the spinning disks where no longer maxing out on 100% usage.
Memory and CPU did not seem to be a problem.
Since had the option to recreate the pool and I was not using the recommended settings I did not really dive into the issue.
I will not stray to far from the recommended settings in the future though :)

Cheers,
Robert van Leeuwen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com