Re: radosgw stopped working

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Mon, 23 Dec 2024 10:16:10 -0500

> autoscale_mode for pg is on for a particular pool
> (default.rgw.buckets.data) and EC 3-2 is used. During pool lifetime I've
> seen one time that PG number have changed automatically

pg_num for a given pool likes to be a power of 2, so either the relative usage of pools or the overall cluster fillage has to change substantially for a change to be triggered in many cases.

> but now I am also considering changing PG number manually after backfills completes.

If you do, be sure to disable the autoscaler for that pool.

> Right now pg_num 512 pgp_num 512 is used and I am considering to change it
> to 1024. Do you think that would be too aggressive maybe?

Depends on how many OSDs you have and what the rest of the pools are like.  Send us 

`ceph osd dump | grep pool`

These days, assuming that your OSDs are BlueStore, chances are that going higher on pg_num won’t cause issues.  

> 
> Rok
> 
> On Sun, Dec 22, 2024 at 8:46 PM Alwin Antreich <alwin.antreich@xxxxxxxx>
> wrote:
> 
>> Hi Rok,
>> 
>> On Sun, 22 Dec 2024 at 20:19, Rok Jaklič <rjaklic@xxxxxxxxx> wrote:
>> 
>>> First I tried with osd reweight, waited a few hours then osd crush
>>> reweight, then with pg-umpap from Laimis. Seems to crush reweight was most
>>> effective, but not for "all" osds I tried.
>>> 
>>> Uh, probably I've set ceph config set osd osd_max_backfills to high
>>> number in the past, probably better to reduce it to 1 in steps, since now
>>> much backfilling is already going on?
>>> 
>> Every time a backfill finishes, a new one will be placed in the queue. The
>> number of backfills won't reduce as long as you don't lower it. You can
>> adjust it and see if it improves the backfill process or not (wait an hour
>> or two).
>> 
>> 
>>> 
>>> Output of commands in attachment.
>>> 
>> There seems to be a low amount of PGs for the rgw data pool, compared to
>> the amount of OSDs. Though it depends on the EC profile and size of a shard
>> (`ceph pg <id> query`) if this is really an issue. But in general the
>> amount of PGs is important, because too few of them will make them grow
>> larger. Hence backfilling a PG will take a longer time and easier tilts the
>> usage of OSDs, as the algorithm works by pseudo-randomly placing PGs and
>> not taking its size into account.
>> 
>> I'd wait with the PG adjustment after the backfilling to the HDDs has
>> finished, should you need to adjust the number of PGs. As this will create
>> more data movement.
>> 
>> Cheers,
>> Alwin
>> croit GmbH, https://croit.io/
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx