Re: radosgw stopped working

Rok Jaklič <rjaklic@xxxxxxxxx> · Mon, 23 Dec 2024 07:28:40 +0100

However I now see that autoscaler is probably not working because of:

ceph-mgr.ctplmon1.log:2024-12-23T07:12:00.921+0100 7f949edad640  0
[pg_autoscaler WARNING root] pool default.rgw.buckets.index won't scale due
to overlapping roots: {-1, -18}
ceph-mgr.ctplmon1.log:2024-12-23T07:12:00.923+0100 7f949edad640  0
[pg_autoscaler WARNING root] pool default.rgw.buckets.data won't scale due
to overlapping roots: {-2, -1, -18}
ceph-mgr.ctplmon1.log:2024-12-23T07:12:00.929+0100 7f949edad640  0
[pg_autoscaler WARNING root] pool 1 contains an overlapping root -1...
skipping scaling
ceph-mgr.ctplmon1.log:2024-12-23T07:12:00.929+0100 7f949edad640  0
[pg_autoscaler WARNING root] pool 2 contains an overlapping root -1...
skipping scaling
ceph-mgr.ctplmon1.log:2024-12-23T07:12:00.930+0100 7f949edad640  0
[pg_autoscaler WARNING root] pool 3 contains an overlapping root -1...
skipping scaling
ceph-mgr.ctplmon1.log:2024-12-23T07:12:00.931+0100 7f949edad640  0
[pg_autoscaler WARNING root] pool 4 contains an overlapping root -1...
skipping scaling
ceph-mgr.ctplmon1.log:2024-12-23T07:12:00.931+0100 7f949edad640  0
[pg_autoscaler WARNING root] pool 5 contains an overlapping root -1...
skipping scaling
ceph-mgr.ctplmon1.log:2024-12-23T07:12:00.932+0100 7f949edad640  0
[pg_autoscaler WARNING root] pool 6 contains an overlapping root -18...
skipping scaling
ceph-mgr.ctplmon1.log:2024-12-23T07:12:00.932+0100 7f949edad640  0
[pg_autoscaler WARNING root] pool 7 contains an overlapping root -1...
skipping scaling
ceph-mgr.ctplmon1.log:2024-12-23T07:12:00.933+0100 7f949edad640  0
[pg_autoscaler WARNING root] pool 9 contains an overlapping root -2...
skipping scaling
ceph-mgr.ctplmon1.log:2024-12-23T07:12:00.934+0100 7f949edad640  0
[pg_autoscaler WARNING root] pool 10 contains an overlapping root -1...
skipping scaling
ceph-mgr.ctplmon1.log:2024-12-23T07:12:00.934+0100 7f949edad640  0
[pg_autoscaler WARNING root] pool 11 contains an overlapping root -1...
skipping scaling

Rok

On Mon, Dec 23, 2024 at 6:45 AM Rok Jaklič <rjaklic@xxxxxxxxx> wrote:

> autoscale_mode for pg is on for a particular pool
> (default.rgw.buckets.data) and EC 3-2 is used. During pool lifetime I've
> seen one time that PG number have changed automatically, but now I am also
> considering changing PG number manually after backfills completes.
>
> Right now pg_num 512 pgp_num 512 is used and I am considering to change it
> to 1024. Do you think that would be too aggressive maybe?
>
> Rok
>
> On Sun, Dec 22, 2024 at 8:46 PM Alwin Antreich <alwin.antreich@xxxxxxxx>
> wrote:
>
>> Hi Rok,
>>
>> On Sun, 22 Dec 2024 at 20:19, Rok Jaklič <rjaklic@xxxxxxxxx> wrote:
>>
>>> First I tried with osd reweight, waited a few hours then osd crush
>>> reweight, then with pg-umpap from Laimis. Seems to crush reweight was most
>>> effective, but not for "all" osds I tried.
>>>
>>> Uh, probably I've set ceph config set osd osd_max_backfills to high
>>> number in the past, probably better to reduce it to 1 in steps, since now
>>> much backfilling is already going on?
>>>
>> Every time a backfill finishes, a new one will be placed in the queue.
>> The number of backfills won't reduce as long as you don't lower it. You can
>> adjust it and see if it improves the backfill process or not (wait an hour
>> or two).
>>
>>
>>>
>>> Output of commands in attachment.
>>>
>> There seems to be a low amount of PGs for the rgw data pool, compared to
>> the amount of OSDs. Though it depends on the EC profile and size of a shard
>> (`ceph pg <id> query`) if this is really an issue. But in general the
>> amount of PGs is important, because too few of them will make them grow
>> larger. Hence backfilling a PG will take a longer time and easier tilts the
>> usage of OSDs, as the algorithm works by pseudo-randomly placing PGs and
>> not taking its size into account.
>>
>> I'd wait with the PG adjustment after the backfilling to the HDDs has
>> finished, should you need to adjust the number of PGs. As this will create
>> more data movement.
>>
>> Cheers,
>> Alwin
>> croit GmbH, https://croit.io/
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx