Okay, I gave up, as I should have long ago. Unlinking was a massive pain in the rear, even when scripted, as I never found a good method to catch the hanging without hanging the script, or the filesystem. After several attempts, because I only have a couple of primary directories which were using the damaged pool, I simply removed the pool from cephfs, and deleted the pool outright, with the plan to move the damaged directories into a .dead hidden dir to psuedo fix the issue for myself. After the pool deletion, I noticed that the directories were browsable, all damaged files removed. However, I am unsure if this was because all files were unlinked by the script successfully (script didn't actually hang, just hit a task queue bug and never finished/closed) or if removing the pool also removed the files. Either way, I am down 20T of (off-site recoverable) data, so off to downloading I go! haha Michael, Thank you for your help earlier. Hopefully this little saga is useful to someone in future too! Joshua On Wed, Apr 14, 2021 at 7:08 AM Joshua West <josh@xxxxxxx> wrote: > > Additional to my last note, I should have mentioned, I am exploring > options to delete the damaged data, but in hopes to preserve what I > can, prior to moving to simply deleting all data on that pool. > > When trying to simply empty pgs, it seems like the pgs don't exist. > > In attempting to follow: > https://medium.com/opsops/recovering-ceph-from-reduced-data-availability-3-pgs-inactive-3-pgs-incomplete-b97cbcb4b5a1 > with regard to deleting pgs with zero objects/data, I receive: > > #ceph pg ls incomplete > .... > 47.3ff 0 0 0 0 0 > 0 0 0 incomplete 3m 0'0 527856:6054 > [7,9,2]p7 [7,9,2]p7... > > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-7 --op info > --pgid 47.3ff > PG '47.3ff' not found > > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-7 --op > remove --pgid 47.3ff --force > PG '47.3ff' not found > > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-7 --op > mark-complete --pgid 47.3ff > PG '47.3ff' not found > > # ceph osd force-create-pg 47.3ff --yes-i-really-mean-it > (worked, but I don't have the output handy) --> No change. > > > Since I am having troubles with this process, can't delete pgs, can't > get OIDs for incomplete pgs, I had the idea to start from the other > end: > Is there a method to determine which files are not stuck, to copy them > out, prior to deleting the whole pool? > > As we know, `ls` becomes stuck, so how is it best to get a list of > filepaths+filenames for cephfs? > My current plan is to get that list, and simply brute force attempting > to copy all files, but each copy in it's own thread + timeout. Does > this make sense? > > > Joshua > > On Wed, Apr 14, 2021 at 6:03 AM Joshua West <josh@xxxxxxx> wrote: > > > > Just working this through, how does one identify the OIDs within a PG, > > without list_unfound? > > > > I've been poking around, but can't seem to find a command that outputs > > the necessary OIDs. I tried a handful of cephfs commands, but they of > > course become stuck, and ceph pg commands haven't revealed the OID > > yet. > > > > Joshua > > > > > > Joshua West > > President > > 403-456-0072 > > CAYK.ca > > > > > > On Fri, Apr 9, 2021 at 12:15 PM Joshua West <josh@xxxxxxx> wrote: > > > > > > Absolutely! > > > > > > Attached the files, they're not duplicate, but revised (as I tidied up > > > what I could to make things easier) > > > > > > > Correct me if I'm wrong, but you are willing to throw away all of the data on this pool? > > > > > > Correct, if push comes to shove, I accept that data-loss is probable. > > > If I can manage to save the data, I would definitely be okay with that > > > too though. > > > > > > Still learning to program, but know python quite well. I am going to > > > push off on a script to clean up per your previously noted steps in > > > the language I know! But will hold off on unlinking everything for the > > > moment. > > > > > > Thank you again for your time, your help has already been invaluable to me. > > > > > > Joshua > > > > > > > > > Joshua West > > > President > > > 403-456-0072 > > > CAYK.ca > > > > > > > > > On Fri, Apr 9, 2021 at 7:03 AM Michael Thomas <wart@xxxxxxxxxxx> wrote: > > > > > > > > Hi Joshua, > > > > > > > > I'll dig into this output a bit more later, but here are my thoughts > > > > right now. I'll preface this by saying that I've never had to clean up > > > > from unrecoverable incomplete PGs, so some of what I suggest may not > > > > work/apply or be the ideal fix in your case. > > > > > > > > Correct me if I'm wrong, but you are willing to throw away all of the > > > > data on this pool? This should make it easier because we don't have to > > > > worry about recovering any lost data. > > > > > > > > If this is the case, then I think the general strategy would be: > > > > > > > > 1) Identify and remove any files/directories in cephfs that are located > > > > on this pool (based on ceph.file.layout.pool=claypool and > > > > ceph.dir.layout.pool=claypool). Use 'unlink' instead of 'rm' to remove > > > > the files; it should be less prone to hanging. > > > > > > > > 2) Wait a bit for ceph to clean up any unreferenced objects. Watch the > > > > output of 'ceph df' to see how many objects are listed for the pool. > > > > > > > > 3) Use 'rados -p claypool ls' to identify the remaining objects. Use > > > > the OID identifier to calculate the inode number of each file, then > > > > search cephfs to identify which files these belong to. I would expect > > > > it would be none, as you already deleted the files in step 1. > > > > > > > > 4) With nothing in the cephfs metadata referring to the objects anymore, > > > > it should be safe to remove them with 'rados -p rm'. > > > > > > > > 5) Remove the now-empty pool from cephfs > > > > > > > > 6) Remove the now-empty pool from ceph > > > > > > > > Can you also include the output of 'ceph df'? > > > > > > > > --Mike > > > > > > > > On 4/9/21 7:31 AM, Joshua West wrote: > > > > > Thank you Mike! > > > > > > > > > > This is honestly a way more detailed reply than I was expecting. > > > > > You've equipped me with new tools to work with. Thank you! > > > > > > > > > > I don't actually have any unfound pgs... only "incomplete" ones, which > > > > > limits the usefulness of: > > > > > `grep recovery_unfound` > > > > > `ceph pg $pg list_unfound` > > > > > `ceph pg $pg mark_unfound_lost delete` > > > > > > > > > > I don't seem to see equivalent commands for incomplete pgs, save for > > > > > grep of course. > > > > > > > > > > This does make me slightly more hopeful that recovery might be > > > > > possible if the pgs are incomplete and stuck, but not unfound..? Not > > > > > going to get my hopes too high. > > > > > > > > > > Going to attach a few items just to keep from bugging me, if anyone > > > > > can take a glance, it would be appreciated. > > > > > > > > > > In the meantime, in the absence of the above commands, what's the best > > > > > way to clean this up under the assumption that the data is lost? > > > > > > > > > > ~Joshua > > > > > > > > > > > > > > > Joshua West > > > > > President > > > > > 403-456-0072 > > > > > CAYK.ca > > > > > > > > > > > > > > > On Thu, Apr 8, 2021 at 6:15 PM Michael Thomas <wart@xxxxxxxxxxx> wrote: > > > > >> > > > > >> Hi Joshua, > > > > >> > > > > >> I have had a similar issue three different times on one of my cephfs > > > > >> pools (15.2.10). The first time this happened I had lost some OSDs. In > > > > >> all cases I ended up with degraded PGs with unfound objects that could > > > > >> not be recovered. > > > > >> > > > > >> Here's how I recovered from the situation. Note that this will > > > > >> permanently remove the affected files from ceph. Restoring them from > > > > >> backup is an excercise left to the reader. > > > > >> > > > > >> * Make a list of the affected PGs: > > > > >> ceph pg dump_stuck | grep recovery_unfound > pg.txt > > > > >> > > > > >> * Make a list of the affected objects (OIDs): > > > > >> cat pg.txt | awk '{print $1}' | while read pg ; do echo $pg ; ceph pg > > > > >> $pg list_unfound | jq '.objects[].oid.oid' ; done | sed -e 's/"//g' > > > > > >> oid.txt > > > > >> > > > > >> * Convert the OID numbers to inodes using 'printf "%d\n" 0x${oid}' and > > > > >> put the results in a file called 'inum.txt' > > > > >> > > > > >> * On a ceph client, find the files that correspond to the affected inodes: > > > > >> cat inum.txt | while read inum ; do echo -n "${inum} " ; find > > > > >> /ceph/frames/O3/raw -inum ${inum} ; done > files.txt > > > > >> > > > > >> * It may be helpful to put this table of PG, OID, inum, and files into a > > > > >> spreadsheet to keep track of what's been done. > > > > >> > > > > >> * On the ceph client, use 'unlink' to remove the files from the > > > > >> filesystem. Do not use 'rm', as it will hang while calling 'stat()' on > > > > >> each file. Even unlink may hang when you first try it. If it does > > > > >> hang, do the following to get it unstuck: > > > > >> - Reboot the client > > > > >> - Restart each mon and the mgr. I rebooted each mon/mgr, but it may > > > > >> be sufficient to restart the services without a reboot. > > > > >> - Try using 'unlink' again > > > > >> > > > > >> * After all of the affected files have been removed, go through the list > > > > >> of PGs and remove the unfound OIDs: > > > > >> ceph pg $pgid mark_unfound_lost delete > > > > >> > > > > >> ...or if you're feeling brave, delete them all at once: > > > > >> cat pg.txt | awk '{print $1}' | while read pg ; do echo $pg ; ceph pg > > > > >> $pg mark_unfound_lost delete ; done > > > > >> > > > > >> * Watch the output of 'ceph -s' to see the health of the pools/pgs recover. > > > > >> > > > > >> * Restore the deleted files from backup, or decide that you don't care > > > > >> about them and don't do anything > > > > >> This procedure lets you fix the problem without deleting the affected > > > > >> pool. To be honest, the first time it happened, my solution was to > > > > >> first copy all of the data off of the affected pool and onto a new pool. > > > > >> I later found this to be unnecessary. But if you want to pursue this, > > > > >> here's what I suggest: > > > > >> > > > > >> * Follow the steps above to get rid of the affected files. I feel this > > > > >> should still be done even though you don't care about saving the data, > > > > >> to prevent corruption in the cephfs metadata. > > > > >> > > > > >> * Go through the entire filesystem and look for: > > > > >> - files that are located on the pool (ceph.file.layout.pool = $pool_name) > > > > >> - directories that are set to write files to the pool > > > > >> (ceph.dir.layout.pool = $pool_name) > > > > >> > > > > >> * After you confirm that no files or directories are pointing at the > > > > >> pool anymore, run 'ceph df' and look at the number of objects in the > > > > >> pool. Ideally, it would be zero. But more than likely it isn't. This > > > > >> could be a simple mismatch in the object count in cephfs (harmless), or > > > > >> there could be clients with open filehandles on files that have been > > > > >> removed. such objects will still appear in the rados listing of the > > > > >> pool[1]: > > > > >> rados -p $pool_name ls > > > > >> for obj in $(rados -p $pool_name ls); do echo $obj; rados -p > > > > >> $pool_name getxattr parent | strings; done > > > > >> > > > > >> * To check for clients with access to these stray objects, dump the mds > > > > >> cache: > > > > >> ceph daemon mds.ceph1 dump cache /tmp/cache.txt > > > > >> > > > > >> * Look for lines that refer to the stray objects, like this: > > > > >> [inode 0x10000020fbc [2,head] ~mds0/stray6/10000020fbc auth v7440537 > > > > >> s=252778863 nl=0 n(v0 rc2020-12-11T21:17:59.454863-0600 b252778863 > > > > >> 1=1+0) (iversion lock) caps={9541437=pAsLsXsFscr/pFscr@2},l=9541437 | > > > > >> caps=1 authpin=0 0x563a7e52a000] > > > > >> > > > > >> * The 'caps' field in the output above contains the client session id > > > > >> (eg 9541437). Search the MDS for sessions that match to identify the > > > > >> client: > > > > >> ceph daemon mds.ceph1 session ls > session.txt > > > > >> Search through 'session.txt' for matching entries. This will give > > > > >> you the IP address of the client: > > > > >> "id": 9541437, > > > > >> "entity": { > > > > >> "name": { > > > > >> "type": "client", > > > > >> "num": 9541437 > > > > >> }, > > > > >> "addr": { > > > > >> "type": "v1", > > > > >> "addr": "10.13.5.48:0", > > > > >> "nonce": 2011077845 > > > > >> } > > > > >> }, > > > > >> > > > > >> * Restart the client's connection to ceph to get it to drop the cap. I > > > > >> did this by rebooting the client, but there may be gentler ways to do it. > > > > >> > > > > >> * Once you've done this clean up, it should be safe to remove the pool > > > > >> from cephfs: > > > > >> ceph fs rm_data_pool $fs_name $pool_name > > > > >> > > > > >> * Once the pool has been detached from cephfs, you can remove it from > > > > >> ceph altogether: > > > > >> ceph osd pool rm $pool_name $pool_name --yes-i-really-really-mean-it > > > > >> > > > > >> Hope this helps, > > > > >> > > > > >> --Mike > > > > >> [1]http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005234.html > > > > >> > > > > >> > > > > >> > > > > >> On 4/8/21 5:41 PM, Joshua West wrote: > > > > >>> Hey everyone. > > > > >>> > > > > >>> Inside of cephfs, I have a directory which I setup a directory layout > > > > >>> field to use an erasure coded (CLAY) pool, specific to the task. The > > > > >>> rest of my cephfs is using normal replication. > > > > >>> > > > > >>> Fast forward some time, and the EC directory has been used pretty > > > > >>> extensively, and through some bad luck and poor timing, ~200pgs are in > > > > >>> an incomplete state, and the OSDs are completely gone and > > > > >>> unrecoverable. (Specifically OSD 31 and 34, not that it matters at > > > > >>> this point) > > > > >>> > > > > >>> # ceph pg ls incomplete --> is attached for reference. > > > > >>> > > > > >>> Fortunately, it's primarily (only) my on-site backups, and other > > > > >>> replaceable data inside of > > > > >>> > > > > >>> I tried for a few days to recover the PGs: > > > > >>> - Recreate blank OSDs with correct ID (was blocked by non-existant OSDs) > > > > >>> - Deep Scrub > > > > >>> - osd_find_best_info_ignore_history_les = true (`pg query` was > > > > >>> showing related error) > > > > >>> etc. > > > > >>> > > > > >>> I've finally just accepted this pool to be a lesson learned, and want > > > > >>> to get the rest of my cephfs back to normal. > > > > >>> > > > > >>> My questions: > > > > >>> > > > > >>> -- `ceph osd force-create-pg` doesn't appear to fix pgs, even for pgs > > > > >>> with 0 objects > > > > >>> -- Deleting the pool seems like an appropriate step, but as I am > > > > >>> using an xattr within cephfs, which is otherwise on another pool, I am > > > > >>> not confident that this approach is safe? > > > > >>> -- cephfs currently blocks when attemping to impact every third file > > > > >>> in the EC directory. Once I delete the pool, how will I remove the > > > > >>> files if even `rm` is blocking? > > > > >>> > > > > >>> Thank you for your time, > > > > >>> > > > > >>> Joshua West > > > > >>> _______________________________________________ > > > > >>> ceph-users mailing list -- ceph-users@xxxxxxx > > > > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > >>> > > > > >> > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx