Re: "Incomplete" pg's

"Kyriazis, George" <george.kyriazis@xxxxxxxxx> · Wed, 9 Mar 2022 06:12:57 +0000

Ok, some progress…

I’m describing what I did here, hopefully it will help someone that ended up in the same predicament.

I used "ceph-objectstore-tool … —op mark-complete” to mark the incomplete pgs as complete on the primary OSD, and then brought the OSD up.  The incomplete pg now has a state of active+undersized+degraded+remapped+backfilling, and eventually transitioned to active+clean.  Some pg ended up in lost_unfound state.  This particular pg was “fixed” using “ceph pg <pg-num> mark_unfound_lost delete”.  Some OSDs flopped up-n-down for a few iterations (which was quite scary), but eventually settled.

Yes, some cephfs objects are bad now, and I have to go through the painstaking process of verifying checksums on all files (from backup) and restoring.  The bad thing is that while file attributes of bad files are the same (file size, date, etc.), the checksums are not.. That means I have to use rsync -c (to verify checksums) syncing up, which is much slower.  But at least I have a healthy cluster!

George

> On Mar 8, 2022, at 1:53 PM, Kyriazis, George <george.kyriazis@xxxxxxxxx> wrote:
> 
> Thanks Eugen,
> 
> Yeah, unfortunately the OSDs have been replaced with new OSDs.  Currently the cluster is under rebalancing.  I was thinking that I would try the ''osd_find_best_info_ignore_history_les' trick after the cluster has calmed down and there is no extra traffic on the OSDs.
> 
> Thing is ..  any activity that I try to do on those pgs results in a hang.  I mapped files back to objects id's and did a "rados ... remove", but that hung too.
> 'unlink'ing the files seems to work from the cephfs point of view, but it doesn't look like the underlying object gets removed.  "ceph pg ls | grep incomplete" still lists the same number of objects in the pg.  Perhaps this is for the same reason that "rados remove" hung?  Looks like ceph really wants to return some data from the request, and waits hoping that data becomes available.
> 
> As I said in a previous post, I did a "force_create_pg" on one pg, with a peculiar result: The object count went down to 0, but the pg is still marked as incomplete.
> 
> Thanks,
> 
> George
> 
>> -----Original Message-----
>> From: Eugen Block <eblock@xxxxxx>
>> Sent: Tuesday, March 8, 2022 12:48 AM
>> To: ceph-users@xxxxxxx
>> Subject:  Re: "Incomplete" pg's
>> 
>> Hi,
>> 
>> IIUC the OSDs 3,4,5 have been removed while some PGs still refer to them,
>> correct? Have the OSDs been replaced with the same IDs? If not (so there
>> are currently no OSDs with IDs 3,4,5 in your osd tree) maybe marking them as
>> lost [1] would resolve the stuck PG creation, although I doubt that this will do
>> anything if there aren't any OSDs with these IDs anymore. I haven't had to
>> mark an OSD lost yet myself, so I'm not sure of the consequences.
>> There's a similar thread [2] where the situation got resolved, not by marking
>> the OSDs as lost but by using 'osd_find_best_info_ignore_history_les' which
>> I haven't used myself either. But maybe worth a shot?
>> 
>> 
>> [1]
>> https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-
>> pg/
>> [2]
>> https://lists.ceph.io/hyperkitty/list/ceph-
>> users@xxxxxxx/thread/G6MJF7PGCCW5JTC6R6UV2EXT54YGU3LG/
>> 
>> 
>> Zitat von "Kyriazis, George" <george.kyriazis@xxxxxxxxx>:
>> 
>>> Ok, I saw that there is now a “ceph old force-create-pg” command.
>>> Not sure if it is a replacement of “ceph pg force_create_pg” or if it
>>> does something different.
>>> 
>>> I tried it, and it looked like it worked:
>>> 
>>> # ceph osd force-create-pg 1.353 --yes-i-really-mean-it pg 1.353 now
>>> creating, ok #
>>> 
>>> But the pg is still stuck in “incomplete” state.
>>> 
>>> Re-issuing the same command, I get:
>>> 
>>> # ceph osd force-create-pg 1.353 --yes-i-really-mean-it pg 1.353
>>> already creating #
>>> 
>>> Which means that the request is queued up somewhere, however, the pg
>>> in question is still stuck in incomplete state:
>>> 
>>> # ceph pg ls | grep ^1\.353
>>> 1.353        0         0          0        0             0
>>> 0           0     0                        incomplete    71m
>>>     0'0        54514:92        [4,6,22]p4        [4,6,22]p4
>>> 2022-02-28T15:47:37.794357-0600  2022-02-02T07:53:15.339511-0600 #
>>> 
>>> How do I find out if it is stuck, or just plain queued behind some
>>> other request?
>>> 
>>> Thank you!
>>> 
>>> George
>>> 
>>> On Mar 7, 2022, at 12:09 PM, Kyriazis, George
>>> <george.kyriazis@xxxxxxxxx<mailto:george.kyriazis@xxxxxxxxx>> wrote:
>>> 
>>> After some thought, I decided to try “ceph pg force_create_pg” on the
>>> incomplete pgs, as suggested by name online sources.
>>> 
>>> However, I got:
>>> 
>>> # ceph pg force_create_pg 1.353
>>> no valid command found; 10 closest matches:
>>> pg stat
>>> pg getmap
>>> pg dump [all|summary|sum|delta|pools|osds|pgs|pgs_brief...]
>>> pg dump_json [all|summary|sum|pools|osds|pgs...]
>>> pg dump_pools_json
>>> pg ls-by-pool <poolstr> [<states>...]
>>> pg ls-by-primary
>>> <id|osd.id<http://osd.id/><http://osd.id<http://osd.id/>>>
>>> [<pool:int>] [<states>...]
>>> pg ls-by-osd
>>> <id|osd.id<http://osd.id/><http://osd.id<http://osd.id/>>>
>>> [<pool:int>] [<states>...]
>>> pg ls [<pool:int>] [<states>...]
>>> pg dump_stuck [inactive|unclean|stale|undersized|degraded...]
>>> [<threshold:int>]
>>> Error EINVAL: invalid command
>>> #
>>> 
>>> ?
>>> 
>>> I am running pacific 16.2.7.
>>> 
>>> Thanks!
>>> 
>>> George
>>> 
>>> 
>>> On Mar 4, 2022, at 7:51 AM, Kyriazis, George
>>> <george.kyriazis@xxxxxxxxx<mailto:george.kyriazis@xxxxxxxxx><mailto:ge
>>> orge.kyriazis@xxxxxxxxx>>
>>> wrote:
>>> 
>>> Thanks Janne,
>>> 
>>> (Inline)
>>> 
>>> On Mar 4, 2022, at 1:04 AM, Janne Johansson
>>> 
>> <icepic.dz@xxxxxxxxx<mailto:icepic.dz@xxxxxxxxx><mailto:icepic.dz@gmai
>>> l.com>>
>>> wrote:
>>> 
>>> Due to a mistake on my part, I accidentally destroyed more OSDs that I
>>> needed to, and I ended up with 2 pgs in “incomplete” state.
>>> 
>>> Doing “ceph pg query on one of the pgs that is incomplete, I get the
>>> following (somewhere in the output):
>>> 
>>>         "up": [
>>>             12,
>>>             6,
>>>             20
>>>         ],
>>>         "acting": [
>>>             12,
>>>             6,
>>>             20
>>>         ],
>>>         "avail_no_missing": [],
>>>         "object_location_counts": [],
>>>         "blocked_by": [
>>>             3,
>>>             4,
>>>             5
>>>         ],
>>>         "up_primary": 12,
>>>         "acting_primary": 12,
>>>         "purged_snaps": []
>>> 
>>> 
>>> I am assuming this means that OSDs 3,4,5 were the original ones (that
>>> are now destroyed), but I don’t understand why the output shows 12, 6,
>>> 20 as active.
>>> 
>>> I can't help with the cephfs part since we don't use that, but I think
>>> the above output means "since 3,4,5 are gone, 12,6 and 20 are now
>>> designated as the replacement OSDs to hold the PG", but since 3,4,5
>>> are gone, none of them can backfill into 12,6,20, so 12,6,20 are
>>> waiting for this PG to appear "somewhere" so they can recover.
>>> 
>>> I thought that if that was the case 3,4,5 should be listed as
>>> “active”, with 12,6,20 as “up”..
>>> 
>>> My corcern about cephfs is that, since it is a layer above the ceph
>>> base layer, there maybe the corrective action needs to start at
>>> cephfs, otherwise cephfs won’t be aware of any changes happening
>>> underneath.
>>> 
>>> Perhaps you can force pg creation, so that 12,6,20 gets an empty PG to
>>> start the pool again, and then hope that the next rsync will fill in
>>> any missing slots, but this part I am not so sure about since I don't
>>> know what other data apart from file contents may exist in a cephfs
>>> pool.
>>> 
>>> Is the worst-case (dropping the pool, recreating it and running a full
>>> rsync again) a possible way out? If so, you can perhaps test and see
>>> if you can bridge the gap of the missing PGs, but if resyncing is out,
>>> then wait for suggestions from someone more qualified at cephfs stuff
>>> than me. ;)
>>> 
>>> I’ll wait a bit more for some other people to suggest something.  At
>>> this point I don’t have anything with high confidence that it will
>>> work.
>>> 
>>> Thanks!
>>> 
>>> George
>>> 
>>> 
>>> --
>>> May the most significant bit of your life be positive.
>>> 
>>> _______________________________________________
>>> ceph-users mailing list --
>>> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-
>> users@ceph.i
>>> o>
>>> To unsubscribe send an email to
>>> ceph-users-leave@xxxxxxx<mailto:ceph-users-
>> leave@xxxxxxx><mailto:ceph-
>>> users-leave@xxxxxxx>
>>> 
>>> _______________________________________________
>>> ceph-users mailing list --
>>> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
>>> To unsubscribe send an email to
>>> ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
>>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
>>> email to ceph-users-leave@xxxxxxx
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email
>> to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx