Re: radosgw stopped working

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Mon, 23 Dec 2024 10:41:23 -0500

> 
> [root@ctplmon1 ~]# ceph osd dump | grep pool
> pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 320144 flags hashpspool stripe_width 0 pg_num_min 1 application mgr,mgr_devicehealth
> pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 320144 lfor 0/18964/18962 flags hashpspool stripe_width 0 application rgw
> pool 3 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 320144 lfor 0/127672/127670 flags hashpspool stripe_width 0 application rgw
> pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 320144 lfor 0/59850/59848 flags hashpspool stripe_width 0 application rgw
> pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 320144 lfor 0/51538/51536 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw
> pool 6 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 315285 lfor 0/127830/127828 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw
> pool 7 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 320144 lfor 0/76474/76472 flags hashpspool stripe_width 0 application rgw
> pool 9 'default.rgw.buckets.data' erasure profile ec-32-profile size 5 min_size 4 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512 autoscale_mode on last_change 320144 lfor 0/127784/214408 flags hashpspool,ec_overwrites stripe_width 12288 application rgw
> pool 10 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 320144 flags hashpspool,bulk stripe_width 0 application cephfs
> pool 11 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 4 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 320144 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
> 
> ---

Are you using HDDs, SSDs, or both?  What does the PGs column at the right end of `ceph osd df` average?  I’m still spinning up my brain this morning, but this seems reeeeeally low, like ~17 if all the OSDs are the same device class.

buckets.index, notably, should be way higher.  Assuming that your OSDs are all identical and thus that the index pool spans them all, I’d increase pg_num for the index pool and cephfs_metadata to 256 and for buckets.data to maybe 2048.

> 
> Right now there are around 200 osds (5.5T) in a cluster, with around 25 waiting to be added.

5.5T seems like an unusual number.  Are these old HDDs, or perhaps 3DWPD SSDs?

> 
> Rok
> 
> On Mon, Dec 23, 2024 at 4:16 PM Anthony D'Atri <anthony.datri@xxxxxxxxx <mailto:anthony.datri@xxxxxxxxx>> wrote:
>> 
>> 
>> > autoscale_mode for pg is on for a particular pool
>> > (default.rgw.buckets.data) and EC 3-2 is used. During pool lifetime I've
>> > seen one time that PG number have changed automatically
>> 
>> pg_num for a given pool likes to be a power of 2, so either the relative usage of pools or the overall cluster fillage has to change substantially for a change to be triggered in many cases.
>> 
>> > but now I am also considering changing PG number manually after backfills completes.
>> 
>> If you do, be sure to disable the autoscaler for that pool.
>> 
>> > Right now pg_num 512 pgp_num 512 is used and I am considering to change it
>> > to 1024. Do you think that would be too aggressive maybe?
>> 
>> Depends on how many OSDs you have and what the rest of the pools are like.  Send us 
>> 
>> `ceph osd dump | grep pool`
>> 
>> These days, assuming that your OSDs are BlueStore, chances are that going higher on pg_num won’t cause issues.  
>> 
>> > 
>> > Rok
>> > 
>> > On Sun, Dec 22, 2024 at 8:46 PM Alwin Antreich <alwin.antreich@xxxxxxxx <mailto:alwin.antreich@xxxxxxxx>>
>> > wrote:
>> > 
>> >> Hi Rok,
>> >> 
>> >> On Sun, 22 Dec 2024 at 20:19, Rok Jaklič <rjaklic@xxxxxxxxx <mailto:rjaklic@xxxxxxxxx>> wrote:
>> >> 
>> >>> First I tried with osd reweight, waited a few hours then osd crush
>> >>> reweight, then with pg-umpap from Laimis. Seems to crush reweight was most
>> >>> effective, but not for "all" osds I tried.
>> >>> 
>> >>> Uh, probably I've set ceph config set osd osd_max_backfills to high
>> >>> number in the past, probably better to reduce it to 1 in steps, since now
>> >>> much backfilling is already going on?
>> >>> 
>> >> Every time a backfill finishes, a new one will be placed in the queue. The
>> >> number of backfills won't reduce as long as you don't lower it. You can
>> >> adjust it and see if it improves the backfill process or not (wait an hour
>> >> or two).
>> >> 
>> >> 
>> >>> 
>> >>> Output of commands in attachment.
>> >>> 
>> >> There seems to be a low amount of PGs for the rgw data pool, compared to
>> >> the amount of OSDs. Though it depends on the EC profile and size of a shard
>> >> (`ceph pg <id> query`) if this is really an issue. But in general the
>> >> amount of PGs is important, because too few of them will make them grow
>> >> larger. Hence backfilling a PG will take a longer time and easier tilts the
>> >> usage of OSDs, as the algorithm works by pseudo-randomly placing PGs and
>> >> not taking its size into account.
>> >> 
>> >> I'd wait with the PG adjustment after the backfilling to the HDDs has
>> >> finished, should you need to adjust the number of PGs. As this will create
>> >> more data movement.
>> >> 
>> >> Cheers,
>> >> Alwin
>> >> croit GmbH, https://croit.io/
>> >> 
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
>> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx