Re: osdmap::decode crc error -- 13.2.7 -- most osds down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/20/20 12:40 PM, Dan van der Ster wrote:
> Hi,
> 
> My turn.
> We suddenly have a big outage which is similar/identical to
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036519.html
> 
> Some of the osds are runnable, but most crash when they start -- crc
> error in osdmap::decode.
> I'm able to extract an osd map from a good osd and it decodes well
> with osdmaptool:
> 
> # ceph-objectstore-tool --op get-osdmap --data-path
> /var/lib/ceph/osd/ceph-680/ --file osd.680.map
> 
> But when I try on one of the bad osds I get:
> 
> # ceph-objectstore-tool --op get-osdmap --data-path
> /var/lib/ceph/osd/ceph-666/ --file osd.666.map
> terminate called after throwing an instance of 'ceph::buffer::malformed_input'
>   what():  buffer::malformed_input: bad crc, actual 822724616 !=
> expected 2334082500
> *** Caught signal (Aborted) **
>  in thread 7f600aa42d00 thread_name:ceph-objectstor
>  ceph version 13.2.7 (71bd687b6e8b9424dd5e5974ed542595d8977416) mimic (stable)
>  1: (()+0xf5f0) [0x7f5ffefc45f0]
>  2: (gsignal()+0x37) [0x7f5ffdbae337]
>  3: (abort()+0x148) [0x7f5ffdbafa28]
>  4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f5ffe4be7d5]
>  5: (()+0x5e746) [0x7f5ffe4bc746]
>  6: (()+0x5e773) [0x7f5ffe4bc773]
>  7: (()+0x5e993) [0x7f5ffe4bc993]
>  8: (OSDMap::decode(ceph::buffer::list::iterator&)+0x160e) [0x7f6000f4168e]
>  9: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f6000f42e31]
>  10: (get_osdmap(ObjectStore*, unsigned int, OSDMap&,
> ceph::buffer::list&)+0x1d0) [0x55d30a489190]
>  11: (main()+0x5340) [0x55d30a3aae70]
>  12: (__libc_start_main()+0xf5) [0x7f5ffdb9a505]
>  13: (()+0x3a0f40) [0x55d30a483f40]
> Aborted (core dumped)
> 
> 
> 
> I think I want to inject the osdmap, but can't:
> 
> # ceph-objectstore-tool --op set-osdmap --data-path
> /var/lib/ceph/osd/ceph-666/ --file osd.680.map
> osdmap (#-1:b65b78ab:::osdmap.2983572:0#) does not exist.
> 

Have you tried to list which epoch osd.680 is at and which one osd.666
is at? And which one the MONs are at?

Maybe there is a difference there?

Wido

> 
> How do I do this?
> 
> Thanks for any help!
> 
> dan
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux