On Tue, May 17, 2016 at 08:21:48AM +0900, Christian Balzer wrote: > On Mon, 16 May 2016 22:40:47 +0200 (CEST) Wido den Hollander wrote: > > > > pg_num is the actual amount of PGs. This you can increase without any > > actual data moving. > > Yes and no. > > Increasing the pg_num will split PGs, which causes potentially massive I/O. > Also AFAIK that I/O isn't regulated by the various recovery and backfill > parameters. Where is this potentially massive I/O coming from? I have this naive concept that the PGs are mathematically-calculated buckets, so splitting them would involve little or no I/O, although I can imagine there are management overheads (cpu, memory) involved in correctly maintaining state during the splitting process. > That's probably why recent Ceph versions will only let you increase pg_num > in smallish increments. Oh, I wasn't aware of that! Ok, so it looks like it's mon_osd_max_split_count, introduced by commit d8ccd73. Unfortunately it seems to be missing from the ceph docs. It's mentioned in the Suse docs: https://www.suse.com/documentation/ses-2/singlehtml/book_storage_admin/book_storage_admin.html#storage.bp.cluster_mntc.add_pgnum ...although, if I'm understanding "mon_osd_max_split_count" correctly, their script for calculating the maximum to which you can increase pg_num is incorrect in that it's calculating "current pg_num + mon_osd_max_split_count" when it should be "current pg_num + (mon_osd_max_split_count * number of pool OSDs)". Hmmm, is there a generic command-line(ish) way of determining the number of OSDs involved in a pool? > Moving data (as in redistributing amongst the OSD based on CRUSH) will > indeed not happen until pgp_num is also increased. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com