Re: Remapped PGs

David Orman <ormandj@xxxxxxxxxxxx> · Mon, 10 Aug 2020 10:28:27 -0500

We've gotten a bit further, after evaluating how this remapped count was
determine (pg_temp), we've found the PGs counted as being remapped:

root@ceph01:~# ceph osd dump |grep pg_temp
pg_temp 3.7af [93,1,29]
pg_temp 3.7bc [137,97,5]
pg_temp 3.7d9 [72,120,18]
pg_temp 3.7e8 [80,21,71]
pg_temp 3.7fd [74,51,8]

Looking at 3.7af:
root@ceph01:~# ceph pg map 3.7af
osdmap e15406 pg 3.7af (3.f) -> up [87,156,29] acting [87,156,29]

I'm unclear why this is staying in pg_temp. Is there a way to clean this
up? I would have expected it to be cleaned up as per docs but I might be
missing something here.

On Thu, Aug 6, 2020 at 2:40 PM David Orman <ormandj@xxxxxxxxxxxx> wrote:

> Still haven't figured this out. We went ahead and upgraded the entire
> cluster to Podman 2.0.4 and in the process did OS/Kernel upgrades and
> rebooted every node, one at a time. We've still got 5 PGs stuck in
> 'remapped' state, according to 'ceph -s' but 0 in the pg dump output in
> that state. Does anybody have any suggestions on what to do about this?
>
> On Wed, Aug 5, 2020 at 10:54 AM David Orman <ormandj@xxxxxxxxxxxx> wrote:
>
>> Hi,
>>
>> We see that we have 5 'remapped' PGs, but are unclear why/what to do
>> about it. We shifted some target ratios for the autobalancer and it
>> resulted in this state. When adjusting ratio, we noticed two OSDs go down,
>> but we just restarted the container for those OSDs with podman, and they
>> came back up. Here's status output:
>>
>> ###################
>> root@ceph01:~# ceph status
>> INFO:cephadm:Inferring fsid x
>> INFO:cephadm:Inferring config x
>> INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
>>   cluster:
>>     id:     41bb9256-c3bf-11ea-85b9-9e07b0435492
>>     health: HEALTH_OK
>>
>>   services:
>>     mon: 5 daemons, quorum ceph01,ceph04,ceph02,ceph03,ceph05 (age 2w)
>>     mgr: ceph03.ytkuyr(active, since 2w), standbys: ceph01.aqkgbl,
>> ceph02.gcglcg, ceph04.smbdew, ceph05.yropto
>>     osd: 168 osds: 168 up (since 2d), 168 in (since 2d); 5 remapped pgs
>>
>>   data:
>>     pools:   3 pools, 1057 pgs
>>     objects: 18.00M objects, 69 TiB
>>     usage:   119 TiB used, 2.0 PiB / 2.1 PiB avail
>>     pgs:     1056 active+clean
>>              1    active+clean+scrubbing+deep
>>
>>   io:
>>     client:   859 KiB/s rd, 212 MiB/s wr, 644 op/s rd, 391 op/s wr
>>
>> root@ceph01:~#
>>
>> ###################
>>
>> When I look at ceph pg dump, I don't see any marked as remapped:
>>
>> ###################
>> root@ceph01:~# ceph pg dump |grep remapped
>> INFO:cephadm:Inferring fsid x
>> INFO:cephadm:Inferring config x
>> INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
>> dumped all
>> root@ceph01:~#
>> ###################
>>
>> Any idea what might be going on/how to recover? All OSDs are up. Health
>> is 'OK'. This is Ceph 15.2.4 deployed using Cephadm in containers, on
>> Podman 2.0.3.
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx