Re: ceph pg mark_unfound_lost delete results in confused ceph

Oliver Dzombic <info@xxxxxxxxxx> · Tue, 16 Jan 2024 12:07:39 +0100

Hi,

just in case someone else might run into this or similar issues.

The following helped to solve the issue:

1. restarting the active mgr

brought:

pg 10.17 is stuck inactive for 18m, current state unknown, last acting []

.. the pg into inactive without last acting

2. so we recreated the pg ( as there is anyway no data ):

ceph osd force-create-pg 10.17 --yes-i-really-mean-it

Before the command:

#ceph pg 10.17 query
Error ENOENT: i don't have pgid 10.17

after the command:

# ceph pg 10.17 query
{
    "snap_trimq": "[]",
    "snap_trimq_len": 0,
    "state": "active+clean",
    "epoch": 14555,
    "up": [
        5,
        6
    ],
    "acting": [
        5,
        6
[...]

--
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
Layer7 Networks

mailto:info@xxxxxxxxxx

Anschrift:

Layer7 Networks GmbH
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 96293 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic
UST ID: DE259845632

On 15/01/2024 22:24, Oliver Dzombic wrote:
Hi,

after osd.15 died in the wrong moment there is:

#ceph health detail

[WRN] PG_AVAILABILITY: Reduced data availability: 1 pg stale
     pg 10.17 is stuck stale for 3d, current state 
stale+active+undersized+degraded, last acting [15]
[WRN] PG_DEGRADED: Degraded data redundancy: 172/57063399 objects 
degraded (0.000%), 1 pg degraded, 1 pg undersized
     pg 10.17 is stuck undersized for 3d, current state 
stale+active+undersized+degraded, last acting [15]

which will never resolv as there is no osd.15 anymore.

So a

ceph pg 10.17 mark_unfound_lost delete

was executed.

ceph seems to be a bit confused about pg 10.17 now:

While this worked before, its not working anymore
# ceph pg 10.17 query
Error ENOENT: i don't have pgid 10.17

And while this was pointing to 15 the map changed now to 5 and 6 ( which 
is correct ):
# ceph pg map 10.17
osdmap e14425 pg 10.17 (10.17) -> up [5,6] acting [5,6]

According to ceph health, ceph assumes that osd.15 is still somehow in 
charge.

The pg map seems to think that 10.17 is on osd.5 and osd.6

But pg 10.17 seems not to be really existing, as a query will fail.

Any idea whats going wrong and howto fix this?

Thank you!

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx