Re: Placement Groups undersized after adding OSDs

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 15 Nov 2018 09:07:16 +0530

This is weird. Can you capture the pg query for one of them and narrow down in which epoch it “lost” the previous replica and see if there’s any evidence of why?
On Wed, Nov 14, 2018 at 8:09 PM Wido den Hollander <wido@xxxxxxxx> wrote:
Hi,

I'm in the middle of expanding a Ceph cluster and while having 'ceph -s'

open I suddenly saw a bunch of Placement Groups go undersized.

My first hint was that one or more OSDs have failed, but none did.

So I checked and I saw these Placement Groups undersized:

11.3b54 active+undersized+degraded+remapped+backfill_wait

[1795,639,1422]       1795       [1795,639]           1795

11.362f active+undersized+degraded+remapped+backfill_wait

[1431,1134,2217]       1431      [1134,1468]           1134

11.3e31 active+undersized+degraded+remapped+backfill_wait

[1451,1391,1906]       1451      [1906,2053]           1906

11.50c  active+undersized+degraded+remapped+backfill_wait

[1867,1455,1348]       1867      [1867,2036]           1867

11.421e   active+undersized+degraded+remapped+backfilling

[280,117,1421]        280        [280,117]            280

11.700  active+undersized+degraded+remapped+backfill_wait

[2212,1422,2087]       2212      [2055,2087]           2055

11.735    active+undersized+degraded+remapped+backfilling

[772,1832,1433]        772       [772,1832]            772

11.d5a  active+undersized+degraded+remapped+backfill_wait

[423,1709,1441]        423       [423,1709]            423

11.a95  active+undersized+degraded+remapped+backfill_wait

[1433,1180,978]       1433       [978,1180]            978

11.a67  active+undersized+degraded+remapped+backfill_wait

[1154,1463,2151]       1154      [1154,2151]           1154

11.10ca active+undersized+degraded+remapped+backfill_wait

[2012,486,1457]       2012       [2012,486]           2012

11.2439 active+undersized+degraded+remapped+backfill_wait

[910,1457,1193]        910       [910,1193]            910

11.2f7e active+undersized+degraded+remapped+backfill_wait

[1423,1356,2098]       1423      [1356,2098]           1356

After searching I found that OSDs

1422,1431,1451,1455,1421,1422,1433,1441,1433,1463,1457,1457 and 1423 are

all running on the same (newly) added host.

I checked:

- The host did not reboot

- The OSDs did not restart

The OSDs are up_thru since map 646724 which is from 11:05 this morning

(4,5 hours ago), which is about the same time when these were added.

So these PGs are currently running on *2* replicas while they should be

running on *3*.

We just added 8 nodes with 24 disks each to the cluster, but none of the

existing OSDs were touched.

When looking at PG 11.3b54 I see that 1422 is a backfill target:

$ ceph pg 11.3b54 query|jq '.recovery_state'

The 'enter time' for this is about 30 minutes ago and that's about the

same time this has happened.

'might_have_unfound' tells me OSD 1982 which is in the same rack as 1422

(CRUSH replicates over racks), but that OSD is also online.

It's up_thru = 647122 and that's from about 30 minutes ago. That

ceph-osd process is however running since September and seems to be

functioning fine.

This confuses me as during such an expansion I know that normally a PG

would map to size+1 until the backfill finishes.

The cluster is running Luminous 12.2.8 on CentOS 7.5.

Any ideas on what this could be?

Wido

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com