Double OSD failure (won't start) any recovery options?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've had two osds fail and I'm pretty sure they wont recover from this. I'm looking for help trying to get them back online if possible...
terminate called after throwing an instance of 'ceph::buffer::malformed_input'
  what():  buffer::malformed_input: bad checksum on pg_log_entry_t
- I'm having this problem (http://pastebin.com/raw/jBp6YgUp) when starting my osd.
- The source code related to this is here: https://github.com/badone/ceph/blob/master/src/osd/osd_types.cc#L3422-3433
- The osd logs are here: http://pastebin.com/raw/PWwA0ae6

It seems that my osds were corrupted (unknown as to why), while leaving no trace of problems in dmesg, smart or anything that xfs_repair could find.

These two OSD's are 6TB of my 40 TB array (triple replicated) and I'm pretty sure I can't recover from it. I will know in about 10 hours probably. Does anyone know anything I can try to repair my osds?

My notes on the situation:

- It can't find the superblock on first start after a reboot, no idea why. It's there, I can see it and it doesn't complain after that.
- The two osds were bought at the same time and have similar serials, but no bad smart stats or dmesg errors relating to them.
- The host these were installed to had a funky bios that was only reporting half the ram it had in it. It doesn't have ECC memory. I have since replaced the memory.
- xfs_repair has been run on both osds, nothing seems to have been found by it and the problem  still persists.
- I have been at HEALTH_OK every day, but overnight scrubbing has been uncovering problematic pgs I've had to repair ---- every single night so far. This morning was when it went beyond my ability to repair.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux