On 20.09.2017 16:49, hjcho616 wrote:
[SNIP]
12 inconsistent and 109 scrub errors is something you should fix
first of all. also you can consider using the paid-services of many ceph
support companies. that specialize in these kind of situations. -- that beeing said, here are some suggestions... when it comes to lost object recovery you have come about as far
as i have ever experienced. so everything after here is just
assumptions and wild guesswork to what you can try. I hope others
shouts out if i tell you wildly wrong things. if you have found date pg1.28 from the broken osd and have
checked all other working and nonworking drives, for that pg. then
you need to try and extract the pg from the broken drive. As
always in recovery cases, take a dd clone of the drive and work
from the cloned image. to avoid more damage to the drive, and to
allow you to try multiple times. you should add a temporary injection drive large enough for that
pg, and set its crush weight to 0 so it always drains. make sure
it is up and registered properly in ceph. the idea is to copy the pg manually from broken-osd to the
injection drive, since the export/import fails.. making sure you
get all xattrs included. one can either copy the whole pg, or
just the "missing" objects. if there are few objects i would go
for that, if there are many i would take the whole pg. you wont
get data from leveldb. so i am not at all sure this would work.
but worth a shot. - stop your injection osd, verify it is down and the proccess not
running.
this is all as i said guesstimates so your mileage may vary Ronny Aasen
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com