Re: Trying to increase number of PGs throws "Error E2BIG" though PGs/OSD < mon_max_pg_per_osd

Brad Hubbard <bhubbard@xxxxxxxxxx> · Fri, 12 Jan 2018 16:38:03 +1000

On Fri, Jan 12, 2018 at 11:27 AM, Subhachandra Chandra
<schandra@xxxxxxxxxxxx> wrote:
> Hello,
>
>      We are running experiments on a Ceph cluster before we move data on it.
> While trying to increase the number of PGs on one of the pools it threw the
> following error
>
> root@ctrl1:/# ceph osd pool set data pg_num 65536
> Error E2BIG: specified pg_num 65536 is too large (creating 32768 new PGs on
> ~540 OSDs exceeds per-OSD max of 32)

That comes from here:

https://github.com/ceph/ceph/blob/5d7813f612aea59239c8375aaa00919ae32f952f/src/mon/OSDMonitor.cc#L6027

So the warning is triggered because new_pgs (65536) >
g_conf->mon_osd_max_split_count (32) * expected_osds (540)

>
> There are 2 pools named "data" and "metadata". "data" is an erasure coded
> pool (6,3) and "metadata" is a replicated pool with a replication factor of
> 3.
>
> root@ctrl1:/# ceph osd lspools
> 1 metadata,2 data,
> root@ctrl1:/# ceph osd pool get metadata pg_num
> pg_num: 512
> root@ctrl1:/# ceph osd pool get data pg_num
> pg_num: 32768
>
>     osd: 540 osds: 540 up, 540 in
>          flags noout,noscrub,nodeep-scrub
>
>   data:
>     pools:   2 pools, 33280 pgs
>     objects: 7090k objects, 1662 TB
>     usage:   2501 TB used, 1428 TB / 3929 TB avail
>     pgs:     33280 active+clean
>
> The current PG/OSD ratio according to my calculation should be 549
>>>> (32768 * 9 + 512 * 3 ) / 540.0
> 548.9777777777778
>
> Increasing the number of PGs in the "data" pool should increase the PG/OSD
> ratio to about 1095
>>>> (65536 * 9 + 512 * 3 ) / 540.0
> 1095.111111111111
>
> In the config, settings related to PG/OSD ratio look like
> mon_max_pg_per_osd = 1500
> osd_max_pg_per_osd_hard_ratio = 1.0
>
> Trying to increase the number of PGs to 65536 throws the previously
> mentioned error. The new PG/OSD ratio is still under the configured limit.
> Why do we see the error? Further, there seems to be a bug in the error
> message where it says "exceeds per-OSD max of 32" in terms of where does
> "32" comes from?

Maybe the wording could be better. Perhaps "exceeds per-OSD max with
mon_osd_max_split_count of 32". I'll submit this and see how it goes.

>
> P.S. I understand that the PG/OSD ratio configured on this cluster far
> exceeds the recommended values. The experiment is to find scaling limits and
> try out expansion scenarios.
>
> Thanks
> Subhachandra
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com