Just working this through, how does one identify the OIDs within a PG, without list_unfound? I've been poking around, but can't seem to find a command that outputs the necessary OIDs. I tried a handful of cephfs commands, but they of course become stuck, and ceph pg commands haven't revealed the OID yet. Joshua Joshua West President 403-456-0072 CAYK.ca On Fri, Apr 9, 2021 at 12:15 PM Joshua West <josh@xxxxxxx> wrote: > > Absolutely! > > Attached the files, they're not duplicate, but revised (as I tidied up > what I could to make things easier) > > > Correct me if I'm wrong, but you are willing to throw away all of the data on this pool? > > Correct, if push comes to shove, I accept that data-loss is probable. > If I can manage to save the data, I would definitely be okay with that > too though. > > Still learning to program, but know python quite well. I am going to > push off on a script to clean up per your previously noted steps in > the language I know! But will hold off on unlinking everything for the > moment. > > Thank you again for your time, your help has already been invaluable to me. > > Joshua > > > Joshua West > President > 403-456-0072 > CAYK.ca > > > On Fri, Apr 9, 2021 at 7:03 AM Michael Thomas <wart@xxxxxxxxxxx> wrote: > > > > Hi Joshua, > > > > I'll dig into this output a bit more later, but here are my thoughts > > right now. I'll preface this by saying that I've never had to clean up > > from unrecoverable incomplete PGs, so some of what I suggest may not > > work/apply or be the ideal fix in your case. > > > > Correct me if I'm wrong, but you are willing to throw away all of the > > data on this pool? This should make it easier because we don't have to > > worry about recovering any lost data. > > > > If this is the case, then I think the general strategy would be: > > > > 1) Identify and remove any files/directories in cephfs that are located > > on this pool (based on ceph.file.layout.pool=claypool and > > ceph.dir.layout.pool=claypool). Use 'unlink' instead of 'rm' to remove > > the files; it should be less prone to hanging. > > > > 2) Wait a bit for ceph to clean up any unreferenced objects. Watch the > > output of 'ceph df' to see how many objects are listed for the pool. > > > > 3) Use 'rados -p claypool ls' to identify the remaining objects. Use > > the OID identifier to calculate the inode number of each file, then > > search cephfs to identify which files these belong to. I would expect > > it would be none, as you already deleted the files in step 1. > > > > 4) With nothing in the cephfs metadata referring to the objects anymore, > > it should be safe to remove them with 'rados -p rm'. > > > > 5) Remove the now-empty pool from cephfs > > > > 6) Remove the now-empty pool from ceph > > > > Can you also include the output of 'ceph df'? > > > > --Mike > > > > On 4/9/21 7:31 AM, Joshua West wrote: > > > Thank you Mike! > > > > > > This is honestly a way more detailed reply than I was expecting. > > > You've equipped me with new tools to work with. Thank you! > > > > > > I don't actually have any unfound pgs... only "incomplete" ones, which > > > limits the usefulness of: > > > `grep recovery_unfound` > > > `ceph pg $pg list_unfound` > > > `ceph pg $pg mark_unfound_lost delete` > > > > > > I don't seem to see equivalent commands for incomplete pgs, save for > > > grep of course. > > > > > > This does make me slightly more hopeful that recovery might be > > > possible if the pgs are incomplete and stuck, but not unfound..? Not > > > going to get my hopes too high. > > > > > > Going to attach a few items just to keep from bugging me, if anyone > > > can take a glance, it would be appreciated. > > > > > > In the meantime, in the absence of the above commands, what's the best > > > way to clean this up under the assumption that the data is lost? > > > > > > ~Joshua > > > > > > > > > Joshua West > > > President > > > 403-456-0072 > > > CAYK.ca > > > > > > > > > On Thu, Apr 8, 2021 at 6:15 PM Michael Thomas <wart@xxxxxxxxxxx> wrote: > > >> > > >> Hi Joshua, > > >> > > >> I have had a similar issue three different times on one of my cephfs > > >> pools (15.2.10). The first time this happened I had lost some OSDs. In > > >> all cases I ended up with degraded PGs with unfound objects that could > > >> not be recovered. > > >> > > >> Here's how I recovered from the situation. Note that this will > > >> permanently remove the affected files from ceph. Restoring them from > > >> backup is an excercise left to the reader. > > >> > > >> * Make a list of the affected PGs: > > >> ceph pg dump_stuck | grep recovery_unfound > pg.txt > > >> > > >> * Make a list of the affected objects (OIDs): > > >> cat pg.txt | awk '{print $1}' | while read pg ; do echo $pg ; ceph pg > > >> $pg list_unfound | jq '.objects[].oid.oid' ; done | sed -e 's/"//g' > > > >> oid.txt > > >> > > >> * Convert the OID numbers to inodes using 'printf "%d\n" 0x${oid}' and > > >> put the results in a file called 'inum.txt' > > >> > > >> * On a ceph client, find the files that correspond to the affected inodes: > > >> cat inum.txt | while read inum ; do echo -n "${inum} " ; find > > >> /ceph/frames/O3/raw -inum ${inum} ; done > files.txt > > >> > > >> * It may be helpful to put this table of PG, OID, inum, and files into a > > >> spreadsheet to keep track of what's been done. > > >> > > >> * On the ceph client, use 'unlink' to remove the files from the > > >> filesystem. Do not use 'rm', as it will hang while calling 'stat()' on > > >> each file. Even unlink may hang when you first try it. If it does > > >> hang, do the following to get it unstuck: > > >> - Reboot the client > > >> - Restart each mon and the mgr. I rebooted each mon/mgr, but it may > > >> be sufficient to restart the services without a reboot. > > >> - Try using 'unlink' again > > >> > > >> * After all of the affected files have been removed, go through the list > > >> of PGs and remove the unfound OIDs: > > >> ceph pg $pgid mark_unfound_lost delete > > >> > > >> ...or if you're feeling brave, delete them all at once: > > >> cat pg.txt | awk '{print $1}' | while read pg ; do echo $pg ; ceph pg > > >> $pg mark_unfound_lost delete ; done > > >> > > >> * Watch the output of 'ceph -s' to see the health of the pools/pgs recover. > > >> > > >> * Restore the deleted files from backup, or decide that you don't care > > >> about them and don't do anything > > >> This procedure lets you fix the problem without deleting the affected > > >> pool. To be honest, the first time it happened, my solution was to > > >> first copy all of the data off of the affected pool and onto a new pool. > > >> I later found this to be unnecessary. But if you want to pursue this, > > >> here's what I suggest: > > >> > > >> * Follow the steps above to get rid of the affected files. I feel this > > >> should still be done even though you don't care about saving the data, > > >> to prevent corruption in the cephfs metadata. > > >> > > >> * Go through the entire filesystem and look for: > > >> - files that are located on the pool (ceph.file.layout.pool = $pool_name) > > >> - directories that are set to write files to the pool > > >> (ceph.dir.layout.pool = $pool_name) > > >> > > >> * After you confirm that no files or directories are pointing at the > > >> pool anymore, run 'ceph df' and look at the number of objects in the > > >> pool. Ideally, it would be zero. But more than likely it isn't. This > > >> could be a simple mismatch in the object count in cephfs (harmless), or > > >> there could be clients with open filehandles on files that have been > > >> removed. such objects will still appear in the rados listing of the > > >> pool[1]: > > >> rados -p $pool_name ls > > >> for obj in $(rados -p $pool_name ls); do echo $obj; rados -p > > >> $pool_name getxattr parent | strings; done > > >> > > >> * To check for clients with access to these stray objects, dump the mds > > >> cache: > > >> ceph daemon mds.ceph1 dump cache /tmp/cache.txt > > >> > > >> * Look for lines that refer to the stray objects, like this: > > >> [inode 0x10000020fbc [2,head] ~mds0/stray6/10000020fbc auth v7440537 > > >> s=252778863 nl=0 n(v0 rc2020-12-11T21:17:59.454863-0600 b252778863 > > >> 1=1+0) (iversion lock) caps={9541437=pAsLsXsFscr/pFscr@2},l=9541437 | > > >> caps=1 authpin=0 0x563a7e52a000] > > >> > > >> * The 'caps' field in the output above contains the client session id > > >> (eg 9541437). Search the MDS for sessions that match to identify the > > >> client: > > >> ceph daemon mds.ceph1 session ls > session.txt > > >> Search through 'session.txt' for matching entries. This will give > > >> you the IP address of the client: > > >> "id": 9541437, > > >> "entity": { > > >> "name": { > > >> "type": "client", > > >> "num": 9541437 > > >> }, > > >> "addr": { > > >> "type": "v1", > > >> "addr": "10.13.5.48:0", > > >> "nonce": 2011077845 > > >> } > > >> }, > > >> > > >> * Restart the client's connection to ceph to get it to drop the cap. I > > >> did this by rebooting the client, but there may be gentler ways to do it. > > >> > > >> * Once you've done this clean up, it should be safe to remove the pool > > >> from cephfs: > > >> ceph fs rm_data_pool $fs_name $pool_name > > >> > > >> * Once the pool has been detached from cephfs, you can remove it from > > >> ceph altogether: > > >> ceph osd pool rm $pool_name $pool_name --yes-i-really-really-mean-it > > >> > > >> Hope this helps, > > >> > > >> --Mike > > >> [1]http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005234.html > > >> > > >> > > >> > > >> On 4/8/21 5:41 PM, Joshua West wrote: > > >>> Hey everyone. > > >>> > > >>> Inside of cephfs, I have a directory which I setup a directory layout > > >>> field to use an erasure coded (CLAY) pool, specific to the task. The > > >>> rest of my cephfs is using normal replication. > > >>> > > >>> Fast forward some time, and the EC directory has been used pretty > > >>> extensively, and through some bad luck and poor timing, ~200pgs are in > > >>> an incomplete state, and the OSDs are completely gone and > > >>> unrecoverable. (Specifically OSD 31 and 34, not that it matters at > > >>> this point) > > >>> > > >>> # ceph pg ls incomplete --> is attached for reference. > > >>> > > >>> Fortunately, it's primarily (only) my on-site backups, and other > > >>> replaceable data inside of > > >>> > > >>> I tried for a few days to recover the PGs: > > >>> - Recreate blank OSDs with correct ID (was blocked by non-existant OSDs) > > >>> - Deep Scrub > > >>> - osd_find_best_info_ignore_history_les = true (`pg query` was > > >>> showing related error) > > >>> etc. > > >>> > > >>> I've finally just accepted this pool to be a lesson learned, and want > > >>> to get the rest of my cephfs back to normal. > > >>> > > >>> My questions: > > >>> > > >>> -- `ceph osd force-create-pg` doesn't appear to fix pgs, even for pgs > > >>> with 0 objects > > >>> -- Deleting the pool seems like an appropriate step, but as I am > > >>> using an xattr within cephfs, which is otherwise on another pool, I am > > >>> not confident that this approach is safe? > > >>> -- cephfs currently blocks when attemping to impact every third file > > >>> in the EC directory. Once I delete the pool, how will I remove the > > >>> files if even `rm` is blocking? > > >>> > > >>> Thank you for your time, > > >>> > > >>> Joshua West > > >>> _______________________________________________ > > >>> ceph-users mailing list -- ceph-users@xxxxxxx > > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > >>> > > >> > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx