Re: Increasing pg_num

Chris Dunlop <chris@xxxxxxxxxxxx> · Tue, 17 May 2016 12:12:02 +1000

Hi Christian,

On Tue, May 17, 2016 at 10:41:52AM +0900, Christian Balzer wrote:
> On Tue, 17 May 2016 10:47:15 +1000 Chris Dunlop wrote:
> Most your questions would be easily answered if you did spend a few
> minutes with even the crappiest test cluster and observing things (with
> atop and the likes). 

You're right of course. I'll set up a test cluster and start experimenting,
which I should have done before asking questions here.

> To wit, this is a test pool (12) created with 32 PGs and slightly filled
> with data via rados bench:
> ---
> # ls -la /var/lib/ceph/osd/ceph-8/current/ |grep "12\."
> drwxr-xr-x   2 root root  4096 May 17 10:04 12.13_head
> drwxr-xr-x   2 root root  4096 May 17 10:04 12.1e_head
> drwxr-xr-x   2 root root  4096 May 17 10:04 12.b_head
> # du -h /var/lib/ceph/osd/ceph-8/current/12.13_head/
> 121M    /var/lib/ceph/osd/ceph-8/current/12.13_head/
> ---
> 
> After increasing that to 128 PGs we get this:
> ---
> # ls -la /var/lib/ceph/osd/ceph-8/current/ |grep "12\."
> drwxr-xr-x   2 root root  4096 May 17 10:18 12.13_head
> drwxr-xr-x   2 root root  4096 May 17 10:18 12.1e_head
> drwxr-xr-x   2 root root  4096 May 17 10:18 12.2b_head
> drwxr-xr-x   2 root root  4096 May 17 10:18 12.33_head
> drwxr-xr-x   2 root root  4096 May 17 10:18 12.3e_head
> drwxr-xr-x   2 root root  4096 May 17 10:18 12.4b_head
> drwxr-xr-x   2 root root  4096 May 17 10:18 12.53_head
> drwxr-xr-x   2 root root  4096 May 17 10:18 12.5e_head
> drwxr-xr-x   2 root root  4096 May 17 10:18 12.6b_head
> drwxr-xr-x   2 root root  4096 May 17 10:18 12.73_head
> drwxr-xr-x   2 root root  4096 May 17 10:18 12.7e_head
> drwxr-xr-x   2 root root  4096 May 17 10:18 12.b_head
> # du -h /var/lib/ceph/osd/ceph-8/current/12.13_head/
> 25M     /var/lib/ceph/osd/ceph-8/current/12.13_head/
> ---
> 
> Now this was fairly uneventful even on my crappy test cluster, given the
> small amount of data (which was mostly cached) and the fact that it's idle.
> 
> However consider this with 100's of GB per PG and a busy cluster and you
> get the idea where massive and very disruptive I/O comes from.

Per above, I'll experiment with this, but my first thought is I suspect
that's moving object/data files around rather than copying data, so the
overheads are in directory operations rather than data copies - not that
directory operations are free either of course.

>> Hmmm, is there a generic command-line(ish) way of determining the number
>> of OSDs involved in a pool?
>> 
> Unless you have a pool with a very small pg_num and a very large cluster
> the answer usually tends to be "all of them".

Or, as in my case, several completely independent pools (i.e. different
OSDs) in the one cluster.

> And google ("ceph number of osds per pool") is your friend:
> 
> http://cephnotes.ksperis.com/blog/2015/02/23/get-the-number-of-placement-groups-per-osd

Crap. And I was just looking at that very page yesterday, in the context of
the distribution of the PGs, and completely forgot about the SUM part.

Thanks for taking the time to respond.

Chris.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com