Re: Abandon incomplete (damaged EC) pgs - How to manage the impact on cephfs?

Joshua West <josh@xxxxxxx> · Sun, 18 Apr 2021 06:39:30 -0600

Okay, I gave up, as I should have long ago.

Unlinking was a massive pain in the rear, even when scripted, as I
never found a good method to catch the hanging without hanging the
script, or the filesystem.

After several attempts, because I only have a couple of primary
directories which were using the damaged pool, I simply removed the
pool from cephfs, and deleted the pool outright, with the plan to move
the damaged directories into a .dead hidden dir to psuedo fix the
issue for myself.

After the pool deletion, I noticed that the directories were
browsable, all damaged files removed. However, I am unsure if this was
because all files were unlinked by the script successfully (script
didn't actually hang, just hit a task queue bug and never
finished/closed) or if removing the pool also removed the files.

Either way, I am down 20T of (off-site recoverable) data, so off to
downloading I go! haha

Michael, Thank you for your help earlier. Hopefully this little saga
is useful to someone in future too!

Joshua

On Wed, Apr 14, 2021 at 7:08 AM Joshua West <josh@xxxxxxx> wrote:
>
> Additional to my last note, I should have mentioned, I am exploring
> options to delete the damaged data, but in hopes to preserve what I
> can, prior to moving to simply deleting all data on that pool.
>
> When trying to simply empty pgs, it seems like the pgs don't exist.
>
> In attempting to follow:
> https://medium.com/opsops/recovering-ceph-from-reduced-data-availability-3-pgs-inactive-3-pgs-incomplete-b97cbcb4b5a1
> with regard to deleting pgs with zero objects/data, I receive:
>
> #ceph pg ls incomplete
> ....
> 47.3ff        0         0          0        0            0
> 0           0     0  incomplete     3m            0'0    527856:6054
>    [7,9,2]p7      [7,9,2]p7...
>
> #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-7 --op info
> --pgid 47.3ff
> PG '47.3ff' not found
>
> #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-7 --op
> remove --pgid 47.3ff --force
> PG '47.3ff' not found
>
> #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-7 --op
> mark-complete --pgid 47.3ff
> PG '47.3ff' not found
>
> # ceph osd force-create-pg 47.3ff --yes-i-really-mean-it
> (worked, but I don't have the output handy) --> No change.
>
>
> Since I am having troubles with this process, can't delete pgs, can't
> get OIDs for incomplete pgs, I had the idea to start from the other
> end:
> Is there a method to determine which files are not stuck, to copy them
> out, prior to deleting the whole pool?
>
> As we know, `ls` becomes stuck, so how is it best to get a list of
> filepaths+filenames for cephfs?
> My current plan is to get that list, and simply brute force attempting
> to copy all files, but each copy in it's own thread + timeout. Does
> this make sense?
>
>
> Joshua
>
> On Wed, Apr 14, 2021 at 6:03 AM Joshua West <josh@xxxxxxx> wrote:
> >
> > Just working this through, how does one identify the OIDs within a PG,
> > without list_unfound?
> >
> > I've been poking around, but can't seem to find a command that outputs
> > the necessary OIDs. I tried a handful of cephfs commands, but they of
> > course become stuck, and ceph pg commands haven't revealed the OID
> > yet.
> >
> > Joshua
> >
> >
> > Joshua West
> > President
> > 403-456-0072
> > CAYK.ca
> >
> >
> > On Fri, Apr 9, 2021 at 12:15 PM Joshua West <josh@xxxxxxx> wrote:
> > >
> > > Absolutely!
> > >
> > > Attached the files, they're not duplicate, but revised (as I tidied up
> > > what I could to make things easier)
> > >
> > > > Correct me if I'm wrong, but you are willing to throw away all of the data on this pool?
> > >
> > > Correct, if push comes to shove, I accept that data-loss is probable.
> > > If I can manage to save the data, I would definitely be okay with that
> > > too though.
> > >
> > > Still learning to program, but know python quite well. I am going to
> > > push off on a script to clean up per your previously noted steps in
> > > the language I know! But will hold off on unlinking everything for the
> > > moment.
> > >
> > > Thank you again for your time, your help has already been invaluable to me.
> > >
> > > Joshua
> > >
> > >
> > > Joshua West
> > > President
> > > 403-456-0072
> > > CAYK.ca
> > >
> > >
> > > On Fri, Apr 9, 2021 at 7:03 AM Michael Thomas <wart@xxxxxxxxxxx> wrote:
> > > >
> > > > Hi Joshua,
> > > >
> > > > I'll dig into this output a bit more later, but here are my thoughts
> > > > right now.  I'll preface this by saying that I've never had to clean up
> > > > from unrecoverable incomplete PGs, so some of what I suggest may not
> > > > work/apply or be the ideal fix in your case.
> > > >
> > > > Correct me if I'm wrong, but you are willing to throw away all of the
> > > > data on this pool?  This should make it easier because we don't have to
> > > > worry about recovering any lost data.
> > > >
> > > > If this is the case, then I think the general strategy would be:
> > > >
> > > > 1) Identify and remove any files/directories in cephfs that are located
> > > > on this pool (based on ceph.file.layout.pool=claypool and
> > > > ceph.dir.layout.pool=claypool).  Use 'unlink' instead of 'rm' to remove
> > > > the files; it should be less prone to hanging.
> > > >
> > > > 2) Wait a bit for ceph to clean up any unreferenced objects.  Watch the
> > > > output of 'ceph df' to see how many objects are listed for the pool.
> > > >
> > > > 3) Use 'rados -p claypool ls' to identify the remaining objects.  Use
> > > > the OID identifier to calculate the inode number of each file, then
> > > > search cephfs to identify which files these belong to.  I would expect
> > > > it would be none, as you already deleted the files in step 1.
> > > >
> > > > 4) With nothing in the cephfs metadata referring to the objects anymore,
> > > > it should be safe to remove them with 'rados -p rm'.
> > > >
> > > > 5) Remove the now-empty pool from cephfs
> > > >
> > > > 6) Remove the now-empty pool from ceph
> > > >
> > > > Can you also include the output of 'ceph df'?
> > > >
> > > > --Mike
> > > >
> > > > On 4/9/21 7:31 AM, Joshua West wrote:
> > > > > Thank you Mike!
> > > > >
> > > > > This is honestly a way more detailed reply than I was expecting.
> > > > > You've equipped me with new tools to work with.  Thank you!
> > > > >
> > > > > I don't actually have any unfound pgs... only "incomplete" ones, which
> > > > > limits the usefulness of:
> > > > > `grep recovery_unfound`
> > > > > `ceph pg $pg list_unfound`
> > > > > `ceph pg $pg mark_unfound_lost delete`
> > > > >
> > > > > I don't seem to see equivalent commands for incomplete pgs, save for
> > > > > grep of course.
> > > > >
> > > > > This does make me slightly more hopeful that recovery might be
> > > > > possible if the pgs are incomplete and stuck, but not unfound..? Not
> > > > > going to get my hopes too high.
> > > > >
> > > > > Going to attach a few items just to keep from bugging me, if anyone
> > > > > can take a glance, it would be appreciated.
> > > > >
> > > > > In the meantime, in the absence of the above commands, what's the best
> > > > > way to clean this up under the assumption that the data is lost?
> > > > >
> > > > > ~Joshua
> > > > >
> > > > >
> > > > > Joshua West
> > > > > President
> > > > > 403-456-0072
> > > > > CAYK.ca
> > > > >
> > > > >
> > > > > On Thu, Apr 8, 2021 at 6:15 PM Michael Thomas <wart@xxxxxxxxxxx> wrote:
> > > > >>
> > > > >> Hi Joshua,
> > > > >>
> > > > >> I have had a similar issue three different times on one of my cephfs
> > > > >> pools (15.2.10). The first time this happened I had lost some OSDs.  In
> > > > >> all cases I ended up with degraded PGs with unfound objects that could
> > > > >> not be recovered.
> > > > >>
> > > > >> Here's how I recovered from the situation.  Note that this will
> > > > >> permanently remove the affected files from ceph.  Restoring them from
> > > > >> backup is an excercise left to the reader.
> > > > >>
> > > > >> * Make a list of the affected PGs:
> > > > >>     ceph pg dump_stuck  | grep recovery_unfound > pg.txt
> > > > >>
> > > > >> * Make a list of the affected objects (OIDs):
> > > > >>     cat pg.txt | awk '{print $1}' | while read pg ; do echo $pg ; ceph pg
> > > > >> $pg list_unfound | jq '.objects[].oid.oid' ; done | sed -e 's/"//g' >
> > > > >> oid.txt
> > > > >>
> > > > >> * Convert the OID numbers to inodes using 'printf "%d\n" 0x${oid}' and
> > > > >> put the results in a file called 'inum.txt'
> > > > >>
> > > > >> * On a ceph client, find the files that correspond to the affected inodes:
> > > > >>     cat inum.txt | while read inum ; do echo -n "${inum} " ; find
> > > > >> /ceph/frames/O3/raw -inum ${inum} ; done > files.txt
> > > > >>
> > > > >> * It may be helpful to put this table of PG, OID, inum, and files into a
> > > > >> spreadsheet to keep track of what's been done.
> > > > >>
> > > > >> * On the ceph client, use 'unlink' to remove the files from the
> > > > >> filesystem.  Do not use 'rm', as it will hang while calling 'stat()' on
> > > > >> each file.  Even unlink may hang when you first try it.  If it does
> > > > >> hang, do the following to get it unstuck:
> > > > >>     - Reboot the client
> > > > >>     - Restart each mon and the mgr.  I rebooted each mon/mgr, but it may
> > > > >> be sufficient to restart the services without a reboot.
> > > > >>     - Try using 'unlink' again
> > > > >>
> > > > >> * After all of the affected files have been removed, go through the list
> > > > >> of PGs and remove the unfound OIDs:
> > > > >>     ceph pg $pgid mark_unfound_lost delete
> > > > >>
> > > > >> ...or if you're feeling brave, delete them all at once:
> > > > >>     cat pg.txt | awk '{print $1}' | while read pg ; do echo $pg ; ceph pg
> > > > >> $pg mark_unfound_lost delete ; done
> > > > >>
> > > > >> * Watch the output of 'ceph -s' to see the health of the pools/pgs recover.
> > > > >>
> > > > >> * Restore the deleted files from backup, or decide that you don't care
> > > > >> about them and don't do anything
> > > > >> This procedure lets you fix the problem without deleting the affected
> > > > >> pool.  To be honest, the first time it happened, my solution was to
> > > > >> first copy all of the data off of the affected pool and onto a new pool.
> > > > >>    I later found this to be unnecessary.  But if you want to pursue this,
> > > > >> here's what I suggest:
> > > > >>
> > > > >> * Follow the steps above to get rid of the affected files.  I feel this
> > > > >> should still be done even though you don't care about saving the data,
> > > > >> to prevent corruption in the cephfs metadata.
> > > > >>
> > > > >> * Go through the entire filesystem and look for:
> > > > >>     - files that are located on the pool (ceph.file.layout.pool = $pool_name)
> > > > >>     - directories that are set to write files to the pool
> > > > >> (ceph.dir.layout.pool = $pool_name)
> > > > >>
> > > > >> * After you confirm that no files or directories are pointing at the
> > > > >> pool anymore, run 'ceph df' and look at the number of objects in the
> > > > >> pool.  Ideally, it would be zero.  But more than likely it isn't.  This
> > > > >> could be a simple mismatch in the object count in cephfs (harmless), or
> > > > >> there could be clients with open filehandles on files that have been
> > > > >> removed.  such objects will still appear in the rados listing of the
> > > > >> pool[1]:
> > > > >>     rados -p $pool_name ls
> > > > >>     for obj in $(rados -p $pool_name ls); do echo $obj; rados -p
> > > > >> $pool_name getxattr parent | strings; done
> > > > >>
> > > > >> * To check for clients with access to these stray objects, dump the mds
> > > > >> cache:
> > > > >>     ceph daemon mds.ceph1 dump cache /tmp/cache.txt
> > > > >>
> > > > >> * Look for lines that refer to the stray objects, like this:
> > > > >>     [inode 0x10000020fbc [2,head] ~mds0/stray6/10000020fbc auth v7440537
> > > > >> s=252778863 nl=0 n(v0 rc2020-12-11T21:17:59.454863-0600 b252778863
> > > > >> 1=1+0) (iversion lock) caps={9541437=pAsLsXsFscr/pFscr@2},l=9541437 |
> > > > >> caps=1 authpin=0 0x563a7e52a000]
> > > > >>
> > > > >> * The 'caps' field in the output above contains the client session id
> > > > >> (eg 9541437).  Search the MDS for sessions that match to identify the
> > > > >> client:
> > > > >>     ceph daemon mds.ceph1 session ls > session.txt
> > > > >>     Search through 'session.txt' for matching entries.  This will give
> > > > >> you the IP address of the client:
> > > > >>           "id": 9541437,
> > > > >>           "entity": {
> > > > >>               "name": {
> > > > >>                   "type": "client",
> > > > >>                   "num": 9541437
> > > > >>               },
> > > > >>               "addr": {
> > > > >>                   "type": "v1",
> > > > >>                   "addr": "10.13.5.48:0",
> > > > >>                   "nonce": 2011077845
> > > > >>               }
> > > > >>           },
> > > > >>
> > > > >> * Restart the client's connection to ceph to get it to drop the cap.  I
> > > > >> did this by rebooting the client, but there may be gentler ways to do it.
> > > > >>
> > > > >> * Once you've done this clean up, it should be safe to remove the pool
> > > > >> from cephfs:
> > > > >>     ceph fs rm_data_pool $fs_name $pool_name
> > > > >>
> > > > >> * Once the pool has been detached from cephfs, you can remove it from
> > > > >> ceph altogether:
> > > > >>     ceph osd pool rm $pool_name $pool_name --yes-i-really-really-mean-it
> > > > >>
> > > > >> Hope this helps,
> > > > >>
> > > > >> --Mike
> > > > >> [1]http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005234.html
> > > > >>
> > > > >>
> > > > >>
> > > > >> On 4/8/21 5:41 PM, Joshua West wrote:
> > > > >>> Hey everyone.
> > > > >>>
> > > > >>> Inside of cephfs, I have a directory which I setup a directory layout
> > > > >>> field to use an erasure coded (CLAY) pool, specific to the task. The
> > > > >>> rest of my cephfs is using normal replication.
> > > > >>>
> > > > >>> Fast forward some time, and the EC directory has been used pretty
> > > > >>> extensively, and through some bad luck and poor timing, ~200pgs are in
> > > > >>> an incomplete state, and the OSDs are completely gone and
> > > > >>> unrecoverable. (Specifically OSD 31 and 34, not that it matters at
> > > > >>> this point)
> > > > >>>
> > > > >>> # ceph pg ls incomplete --> is attached for reference.
> > > > >>>
> > > > >>> Fortunately, it's primarily (only) my on-site backups, and other
> > > > >>> replaceable data inside of
> > > > >>>
> > > > >>> I tried for a few days to recover the PGs:
> > > > >>>    - Recreate blank OSDs with correct ID (was blocked by non-existant OSDs)
> > > > >>>    - Deep Scrub
> > > > >>>    - osd_find_best_info_ignore_history_les = true (`pg query` was
> > > > >>> showing related error)
> > > > >>> etc.
> > > > >>>
> > > > >>> I've finally just accepted this pool to be a lesson learned, and want
> > > > >>> to get the rest of my cephfs back to normal.
> > > > >>>
> > > > >>> My questions:
> > > > >>>
> > > > >>>    -- `ceph osd force-create-pg` doesn't appear to fix pgs, even for pgs
> > > > >>> with 0 objects
> > > > >>>    -- Deleting the pool seems like an appropriate step, but as I am
> > > > >>> using an xattr within cephfs, which is otherwise on another pool, I am
> > > > >>> not confident that this approach is safe?
> > > > >>>    -- cephfs currently blocks when attemping to impact every third file
> > > > >>> in the EC directory. Once I delete the pool, how will I remove the
> > > > >>> files if even `rm` is blocking?
> > > > >>>
> > > > >>> Thank you for your time,
> > > > >>>
> > > > >>> Joshua West
> > > > >>> _______________________________________________
> > > > >>> ceph-users mailing list -- ceph-users@xxxxxxx
> > > > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > >>>
> > > > >>
> > > >
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx