I am running 10.2.0-0ubuntu0.16.04.1.
I've run into a problem w/ cephfs metadata pool. Specifically I have a pg w/ an 'unfound' object.
But i can't figure out which since when i run:
ceph pg 12.94 list_unfound
it hangs (as does ceph pg 12.94 query). I know its in the cephfs metadata pool since I run:
ceph pg ls-by-pool cephfs_metadata |egrep "pg_stat|12\\.94"
and it shows it there:
pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
12.94 231 1 1 0 1 90 3092 3092 active+recovering+degraded 2016-05-18 23:49:15.718772 8957'386130 9472:367098 [1,4] 1 [1,4] 1 8935'385144 2016-05-18 10:46:46.123526 8337'379527 2016-05-14 22:37:05.974367
OK, so what is hanging, and how can i get it to unhang so i can run a 'mark_unfound_lost' on it?
pg 12.94 is on osd.0
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 5.48996 root default
-2 0.89999 host nubo-1
0 0.89999 osd.0 up 1.00000 1.00000
-3 0.89999 host nubo-2
1 0.89999 osd.1 up 1.00000 1.00000
-4 0.89999 host nubo-3
2 0.89999 osd.2 up 1.00000 1.00000
-5 0.92999 host nubo-19
3 0.92999 osd.3 up 1.00000 1.00000
-6 0.92999 host nubo-20
4 0.92999 osd.4 up 1.00000 1.00000
-7 0.92999 host nubo-21
5 0.92999 osd.5 up 1.00000 1.00000
I cranked the logging on osd.0. I see a lot of messages, but nothing interesting.
I've double checked all nodes can ping each other. I've run 'xfs_repair' on the underlying xfs storage to check for issues (there were none).
Can anyone suggest how to uncrack this hang so i can try and repair this system?
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com