Placement Groups undersized after adding OSDs

Wido den Hollander <wido@xxxxxxxx> · Wed, 14 Nov 2018 15:38:36 +0100

Hi,

I'm in the middle of expanding a Ceph cluster and while having 'ceph -s'
open I suddenly saw a bunch of Placement Groups go undersized.

My first hint was that one or more OSDs have failed, but none did.

So I checked and I saw these Placement Groups undersized:

11.3b54 active+undersized+degraded+remapped+backfill_wait
[1795,639,1422]       1795       [1795,639]           1795
11.362f active+undersized+degraded+remapped+backfill_wait
[1431,1134,2217]       1431      [1134,1468]           1134
11.3e31 active+undersized+degraded+remapped+backfill_wait
[1451,1391,1906]       1451      [1906,2053]           1906
11.50c  active+undersized+degraded+remapped+backfill_wait
[1867,1455,1348]       1867      [1867,2036]           1867
11.421e   active+undersized+degraded+remapped+backfilling
[280,117,1421]        280        [280,117]            280
11.700  active+undersized+degraded+remapped+backfill_wait
[2212,1422,2087]       2212      [2055,2087]           2055
11.735    active+undersized+degraded+remapped+backfilling
[772,1832,1433]        772       [772,1832]            772
11.d5a  active+undersized+degraded+remapped+backfill_wait
[423,1709,1441]        423       [423,1709]            423
11.a95  active+undersized+degraded+remapped+backfill_wait
[1433,1180,978]       1433       [978,1180]            978
11.a67  active+undersized+degraded+remapped+backfill_wait
[1154,1463,2151]       1154      [1154,2151]           1154
11.10ca active+undersized+degraded+remapped+backfill_wait
[2012,486,1457]       2012       [2012,486]           2012
11.2439 active+undersized+degraded+remapped+backfill_wait
[910,1457,1193]        910       [910,1193]            910
11.2f7e active+undersized+degraded+remapped+backfill_wait
[1423,1356,2098]       1423      [1356,2098]           1356

After searching I found that OSDs
1422,1431,1451,1455,1421,1422,1433,1441,1433,1463,1457,1457 and 1423 are
all running on the same (newly) added host.

I checked:
- The host did not reboot
- The OSDs did not restart

The OSDs are up_thru since map 646724 which is from 11:05 this morning
(4,5 hours ago), which is about the same time when these were added.

So these PGs are currently running on *2* replicas while they should be
running on *3*.

We just added 8 nodes with 24 disks each to the cluster, but none of the
existing OSDs were touched.

When looking at PG 11.3b54 I see that 1422 is a backfill target:

$ ceph pg 11.3b54 query|jq '.recovery_state'

The 'enter time' for this is about 30 minutes ago and that's about the
same time this has happened.

'might_have_unfound' tells me OSD 1982 which is in the same rack as 1422
(CRUSH replicates over racks), but that OSD is also online.

It's up_thru = 647122 and that's from about 30 minutes ago. That
ceph-osd process is however running since September and seems to be
functioning fine.

This confuses me as during such an expansion I know that normally a PG
would map to size+1 until the backfill finishes.

The cluster is running Luminous 12.2.8 on CentOS 7.5.

Any ideas on what this could be?

Wido
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com