Dear Michael, firstly, I'm a bit confused why you started deleting data. The objects were unfound, but still there. That's a small issue. Now the data might be gone and that's a real issue. ---------------------------- Interval: Anyone reading this: I have seen many threads where ceph admins started deleting objects or PGs or even purging OSDs way too early from a cluster. Trying to recover health by deleting data is a contradiction. Ceph has bugs and sometimes it needs some help finding everything again. As far as I know, for most of these bugs there are workarounds that allow full recovery with a bit of work. ---------------------------- First question is, did you delete the entire object or just a shard on one disk? Are there OSDs that might still have a copy? If the object is gone for good, the file references something that doesn't exist - its like a bad sector. You probably need to delete the file. Bit strange that the operation does not err out with a read error. Maybe it doesn't because it waits for the unfound objects state to be resolved? For all the other unfound objects, they are there somewhere - you didn't loose a disk or something. Try pushing ceph to scan the correct OSDs, for example, by restarting the newly added OSDs one by one or something similar. Sometimes exporting and importing a PG from one OSD to another forces a re-scan and subsequent discovery of unfound objects. It is also possible that ceph will find these objects along the way of recovery or when OSDs scrub or check for objects that can be deleted. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Michael Thomas <wart@xxxxxxxxxxx> Sent: 17 September 2020 22:27:47 To: Frank Schilder; ceph-users@xxxxxxx Subject: Re: multiple OSD crash, unfound objects Hi Frank, Yes, it does sounds similar to your ticket. I've tried a few things to restore the failed files: * Locate a missing object with 'ceph pg $pgid list_unfound' * Convert the hex oid to a decimal inode number * Identify the affected file with 'find /ceph -inum $inode' At this point, I know which file is affected by the missing object. As expected, attempts to read the file simply hang. Unexpectedly, attempts to 'ls' the file or its containing directory also hang. I presume from this that the stat() system call needs some information that is contained in the missing object, and is waiting for the object to become available. Next I tried to remove the affected object with: * ceph pg $pgid mark_unfound_lost delete Now 'ceph status' shows one fewer missing objects, but attempts to 'ls' or 'rm' the affected file continue to hang. Finally, I ran a scrub over the part of the filesystem containing the affected file: ceph tell mds.ceph4 scrub start /frames/postO3/hoft recursive Nothing seemed to come up during the scrub: 2020-09-17T14:56:15.208-0500 7f39bca24700 1 mds.ceph4 asok_command: scrub status {prefix=scrub status} (starting...) 2020-09-17T14:58:58.013-0500 7f39bca24700 1 mds.ceph4 asok_command: scrub start {path=/frames/postO3/hoft,prefix=scrub start,scrubops=[recursive]} (starting...) 2020-09-17T14:58:58.013-0500 7f39b5215700 0 log_channel(cluster) log [INF] : scrub summary: active 2020-09-17T14:58:58.014-0500 7f39b5215700 0 log_channel(cluster) log [INF] : scrub queued for path: /frames/postO3/hoft 2020-09-17T14:58:58.014-0500 7f39b5215700 0 log_channel(cluster) log [INF] : scrub summary: active [paths:/frames/postO3/hoft] 2020-09-17T14:59:02.535-0500 7f39bca24700 1 mds.ceph4 asok_command: scrub status {prefix=scrub status} (starting...) 2020-09-17T15:00:12.520-0500 7f39bca24700 1 mds.ceph4 asok_command: scrub status {prefix=scrub status} (starting...) 2020-09-17T15:02:32.944-0500 7f39b5215700 0 log_channel(cluster) log [INF] : scrub summary: idle 2020-09-17T15:02:32.945-0500 7f39b5215700 0 log_channel(cluster) log [INF] : scrub complete with tag '1405e5c7-3ecf-4754-918e-129e9d101f7a' 2020-09-17T15:02:32.945-0500 7f39b5215700 0 log_channel(cluster) log [INF] : scrub completed for path: /frames/postO3/hoft 2020-09-17T15:02:32.945-0500 7f39b5215700 0 log_channel(cluster) log [INF] : scrub summary: idle After the scrub completed, access to the file (ls or rm) continue to hang. The MDS reports slow reads: 2020-09-17T15:11:05.654-0500 7f39b9a1e700 0 log_channel(cluster) log [WRN] : slow request 481.867381 seconds old, received at 2020-09-17T15:03:03.788058-0500: client_request(client.451432:11309 getattr pAsLsXsFs #0x1000005b1c0 2020-09-17T15:03:03.787602-0500 caller_uid=0, caller_gid=0{}) currently dispatched Does anyone have any suggestions on how else to clean up from a permanently lost object? --Mike On 9/16/20 2:03 AM, Frank Schilder wrote: > Sounds similar to this one: https://tracker.ceph.com/issues/46847 > > If you have or can reconstruct the crush map from before adding the OSDs, you might be able to discover everything with the temporary reversal of the crush map method. > > Not sure if there is another method, i never got a reply to my question in the tracker. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Michael Thomas <wart@xxxxxxxxxxx> > Sent: 16 September 2020 01:27:19 > To: ceph-users@xxxxxxx > Subject: multiple OSD crash, unfound objects > > Over the weekend I had multiple OSD servers in my Octopus cluster > (15.2.4) crash and reboot at nearly the same time. The OSDs are part of > an erasure coded pool. At the time the cluster had been busy with a > long-running (~week) remapping of a large number of PGs after I > incrementally added more OSDs to the cluster. After bringing all of the > OSDs back up, I have 25 unfound objects and 75 degraded objects. There > are other problems reported, but I'm primarily concerned with these > unfound/degraded objects. > > The pool with the missing objects is a cephfs pool. The files stored in > the pool are backed up on tape, so I can easily restore individual files > as needed (though I would not want to restore the entire filesystem). > > I tried following the guide at > https://docs.ceph.com/docs/octopus/rados/troubleshooting/troubleshooting-pg/#unfound-objects. > I found a number of OSDs that are still 'not queried'. Restarting a > sampling of these OSDs changed the state from 'not queried' to 'already > probed', but that did not recover any of the unfound or degraded objects. > > I have also tried 'ceph pg deep-scrub' on the affected PGs, but never > saw them get scrubbed. I also tried doing a 'ceph pg force-recovery' on > the affected PGs, but only one seems to have been tagged accordingly > (see ceph -s output below). > > The guide also says "Sometimes it simply takes some time for the cluster > to query possible locations." I'm not sure how long "some time" might > take, but it hasn't changed after several hours. > > My questions are: > > * Is there a way to force the cluster to query the possible locations > sooner? > > * Is it possible to identify the files in cephfs that are affected, so > that I could delete only the affected files and restore them from backup > tapes? > > --Mike > > ceph -s: > > cluster: > id: 066f558c-6789-4a93-aaf1-5af1ba01a3ad > health: HEALTH_ERR > 1 clients failing to respond to capability release > 1 MDSs report slow requests > 25/78520351 objects unfound (0.000%) > 2 nearfull osd(s) > Reduced data availability: 1 pg inactive > Possible data damage: 9 pgs recovery_unfound > Degraded data redundancy: 75/626645098 objects degraded > (0.000%), 9 pgs degraded > 1013 pgs not deep-scrubbed in time > 1013 pgs not scrubbed in time > 2 pool(s) nearfull > 1 daemons have recently crashed > 4 slow ops, oldest one blocked for 77939 sec, daemons > [osd.0,osd.41] have slow ops. > > services: > mon: 4 daemons, quorum ceph1,ceph2,ceph3,ceph4 (age 9d) > mgr: ceph3(active, since 11d), standbys: ceph2, ceph4, ceph1 > mds: archive:1 {0=ceph4=up:active} 3 up:standby > osd: 121 osds: 121 up (since 6m), 121 in (since 101m); 4 remapped pgs > > task status: > scrub status: > mds.ceph4: idle > > data: > pools: 9 pools, 2433 pgs > objects: 78.52M objects, 298 TiB > usage: 412 TiB used, 545 TiB / 956 TiB avail > pgs: 0.041% pgs unknown > 75/626645098 objects degraded (0.000%) > 135224/626645098 objects misplaced (0.022%) > 25/78520351 objects unfound (0.000%) > 2421 active+clean > 5 active+recovery_unfound+degraded > 3 active+recovery_unfound+degraded+remapped > 2 active+clean+scrubbing+deep > 1 unknown > 1 active+forced_recovery+recovery_unfound+degraded > > progress: > PG autoscaler decreasing pool 7 PGs from 1024 to 512 (5d) > [............................] > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx