Hello, On Tue, 17 May 2016 10:47:15 +1000 Chris Dunlop wrote: > On Tue, May 17, 2016 at 08:21:48AM +0900, Christian Balzer wrote: > > On Mon, 16 May 2016 22:40:47 +0200 (CEST) Wido den Hollander wrote: > > > > > > pg_num is the actual amount of PGs. This you can increase without any > > > actual data moving. > > > > Yes and no. > > > > Increasing the pg_num will split PGs, which causes potentially massive > > I/O. Also AFAIK that I/O isn't regulated by the various recovery and > > backfill parameters. > > Where is this potentially massive I/O coming from? I have this naive > concept that the PGs are mathematically-calculated buckets, so splitting > them would involve little or no I/O, although I can imagine there are > management overheads (cpu, memory) involved in correctly maintaining > state during the splitting process. > I would have thought "splitting" to be pretty unambiguous, in that it involves moving data. That's on top of course for the CPU/RAM resources needed when creating those new PGs and having them peer. Most your questions would be easily answered if you did spend a few minutes with even the crappiest test cluster and observing things (with atop and the likes). To wit, this is a test pool (12) created with 32 PGs and slightly filled with data via rados bench: --- # ls -la /var/lib/ceph/osd/ceph-8/current/ |grep "12\." drwxr-xr-x 2 root root 4096 May 17 10:04 12.13_head drwxr-xr-x 2 root root 4096 May 17 10:04 12.1e_head drwxr-xr-x 2 root root 4096 May 17 10:04 12.b_head # du -h /var/lib/ceph/osd/ceph-8/current/12.13_head/ 121M /var/lib/ceph/osd/ceph-8/current/12.13_head/ --- After increasing that to 128 PGs we get this: --- # ls -la /var/lib/ceph/osd/ceph-8/current/ |grep "12\." drwxr-xr-x 2 root root 4096 May 17 10:18 12.13_head drwxr-xr-x 2 root root 4096 May 17 10:18 12.1e_head drwxr-xr-x 2 root root 4096 May 17 10:18 12.2b_head drwxr-xr-x 2 root root 4096 May 17 10:18 12.33_head drwxr-xr-x 2 root root 4096 May 17 10:18 12.3e_head drwxr-xr-x 2 root root 4096 May 17 10:18 12.4b_head drwxr-xr-x 2 root root 4096 May 17 10:18 12.53_head drwxr-xr-x 2 root root 4096 May 17 10:18 12.5e_head drwxr-xr-x 2 root root 4096 May 17 10:18 12.6b_head drwxr-xr-x 2 root root 4096 May 17 10:18 12.73_head drwxr-xr-x 2 root root 4096 May 17 10:18 12.7e_head drwxr-xr-x 2 root root 4096 May 17 10:18 12.b_head # du -h /var/lib/ceph/osd/ceph-8/current/12.13_head/ 25M /var/lib/ceph/osd/ceph-8/current/12.13_head/ --- Now this was fairly uneventful even on my crappy test cluster, given the small amount of data (which was mostly cached) and the fact that it's idle. However consider this with 100's of GB per PG and a busy cluster and you get the idea where massive and very disruptive I/O comes from. > > That's probably why recent Ceph versions will only let you increase > > pg_num in smallish increments. > > Oh, I wasn't aware of that! > > Ok, so it looks like it's mon_osd_max_split_count, introduced by commit > d8ccd73. Unfortunately it seems to be missing from the ceph docs. It's > mentioned in the Suse docs: > > https://www.suse.com/documentation/ses-2/singlehtml/book_storage_admin/book_storage_admin.html#storage.bp.cluster_mntc.add_pgnum > > ...although, if I'm understanding "mon_osd_max_split_count" correctly, > their script for calculating the maximum to which you can increase > pg_num is incorrect in that it's calculating "current pg_num + > mon_osd_max_split_count" when it should be "current pg_num + > (mon_osd_max_split_count * number of pool OSDs)". > > Hmmm, is there a generic command-line(ish) way of determining the number > of OSDs involved in a pool? > Unless you have a pool with a very small pg_num and a very large cluster the answer usually tends to be "all of them". And google ("ceph number of osds per pool") is your friend: http://cephnotes.ksperis.com/blog/2015/02/23/get-the-number-of-placement-groups-per-osd Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com