Re: "Incomplete" pg's

Eugen Block <eblock@xxxxxx> · Tue, 08 Mar 2022 06:47:45 +0000

Hi,

IIUC the OSDs 3,4,5 have been removed while some PGs still refer to  
them, correct? Have the OSDs been replaced with the same IDs? If not  
(so there are currently no OSDs with IDs 3,4,5 in your osd tree) maybe  
marking them as lost [1] would resolve the stuck PG creation, although  
I doubt that this will do anything if there aren't any OSDs with these  
IDs anymore. I haven't had to mark an OSD lost yet myself, so I'm not  
sure of the consequences.
There's a similar thread [2] where the situation got resolved, not by  
marking the OSDs as lost but by using  
'osd_find_best_info_ignore_history_les' which I haven't used myself  
either. But maybe worth a shot?

[1] https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/
[2]  
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/G6MJF7PGCCW5JTC6R6UV2EXT54YGU3LG/

Zitat von "Kyriazis, George" <george.kyriazis@xxxxxxxxx>:

Ok, I saw that there is now a “ceph old force-create-pg” command.   
Not sure if it is a replacement of “ceph pg force_create_pg” or if  
it does something different.

I tried it, and it looked like it worked:

# ceph osd force-create-pg 1.353 --yes-i-really-mean-it
pg 1.353 now creating, ok
#

But the pg is still stuck in “incomplete” state.

Re-issuing the same command, I get:

# ceph osd force-create-pg 1.353 --yes-i-really-mean-it
pg 1.353 already creating
#

Which means that the request is queued up somewhere, however, the pg  
in question is still stuck in incomplete state:

# ceph pg ls | grep ^1\.353
1.353        0         0          0        0             0            
 0           0     0                        incomplete    71m         
     0'0        54514:92        [4,6,22]p4        [4,6,22]p4   
2022-02-28T15:47:37.794357-0600  2022-02-02T07:53:15.339511-0600
#

How do I find out if it is stuck, or just plain queued behind some  
other request?

Thank you!

George

On Mar 7, 2022, at 12:09 PM, Kyriazis, George  
<george.kyriazis@xxxxxxxxx<mailto:george.kyriazis@xxxxxxxxx>> wrote:

After some thought, I decided to try “ceph pg force_create_pg” on  
the incomplete pgs, as suggested by name online sources.

However, I got:

# ceph pg force_create_pg 1.353
no valid command found; 10 closest matches:
pg stat
pg getmap
pg dump [all|summary|sum|delta|pools|osds|pgs|pgs_brief...]
pg dump_json [all|summary|sum|pools|osds|pgs...]
pg dump_pools_json
pg ls-by-pool <poolstr> [<states>...]
pg ls-by-primary  
<id|osd.id<http://osd.id/><http://osd.id<http://osd.id/>>>  
[<pool:int>] [<states>...]
pg ls-by-osd  
<id|osd.id<http://osd.id/><http://osd.id<http://osd.id/>>>  
[<pool:int>] [<states>...]
pg ls [<pool:int>] [<states>...]
pg dump_stuck [inactive|unclean|stale|undersized|degraded...]  
[<threshold:int>]
Error EINVAL: invalid command
#

?

I am running pacific 16.2.7.

Thanks!

George

On Mar 4, 2022, at 7:51 AM, Kyriazis, George  
<george.kyriazis@xxxxxxxxx<mailto:george.kyriazis@xxxxxxxxx><mailto:george.kyriazis@xxxxxxxxx>>  
wrote:

Thanks Janne,

(Inline)

On Mar 4, 2022, at 1:04 AM, Janne Johansson  
<icepic.dz@xxxxxxxxx<mailto:icepic.dz@xxxxxxxxx><mailto:icepic.dz@xxxxxxxxx>>  
wrote:

Due to a mistake on my part, I accidentally destroyed more OSDs that  
I needed to, and I ended up with 2 pgs in “incomplete” state.

Doing “ceph pg query on one of the pgs that is incomplete, I get the  
following (somewhere in the output):

         "up": [
             12,
             6,
             20
         ],
         "acting": [
             12,
             6,
             20
         ],
         "avail_no_missing": [],
         "object_location_counts": [],
         "blocked_by": [
             3,
             4,
             5
         ],
         "up_primary": 12,
         "acting_primary": 12,
         "purged_snaps": []

I am assuming this means that OSDs 3,4,5 were the original ones  
(that are now destroyed), but I don’t understand why the output  
shows 12, 6, 20 as active.

I can't help with the cephfs part since we don't use that, but I think
the above output means "since 3,4,5 are gone, 12,6 and 20 are now
designated as the replacement OSDs to hold the PG", but since 3,4,5
are gone, none of them can backfill into 12,6,20, so 12,6,20 are
waiting for this PG to appear "somewhere" so they can recover.

I thought that if that was the case 3,4,5 should be listed as  
“active”, with 12,6,20 as “up”..

My corcern about cephfs is that, since it is a layer above the ceph  
base layer, there maybe the corrective action needs to start at  
cephfs, otherwise cephfs won’t be aware of any changes happening  
underneath.

Perhaps you can force pg creation, so that 12,6,20 gets an empty PG to
start the pool again, and then hope that the next rsync will fill in
any missing slots, but this part I am not so sure about since I don't
know what other data apart from file contents may exist in a cephfs
pool.

Is the worst-case (dropping the pool, recreating it and running a full
rsync again) a possible way out? If so, you can perhaps test and see
if you can bridge the gap of the missing PGs, but if resyncing is out,
then wait for suggestions from someone more qualified at cephfs stuff
than me. ;)

I’ll wait a bit more for some other people to suggest something.  At  
this point I don’t have anything with high confidence that it will  
work.

Thanks!

George

--
May the most significant bit of your life be positive.

_______________________________________________
ceph-users mailing list --  
ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to  
ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to  
ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx