On Thu, Aug 15, 2019 at 2:09 AM Troy Ablan <tablan@xxxxxxxxx> wrote: > > Paul, > > Thanks for the reply. All of these seemed to fail except for pulling > the osdmap from the live cluster. > > -Troy > > -[~:#]- ceph-objectstore-tool --op get-osdmap --data-path > /var/lib/ceph/osd/ceph-45/ --file osdmap45 > terminate called after throwing an instance of > 'ceph::buffer::malformed_input' > what(): buffer::malformed_input: unsupported bucket algorithm: -1 That's this code. 3114 switch (alg) { 3115 case CRUSH_BUCKET_UNIFORM: 3116 size = sizeof(crush_bucket_uniform); 3117 break; 3118 case CRUSH_BUCKET_LIST: 3119 size = sizeof(crush_bucket_list); 3120 break; 3121 case CRUSH_BUCKET_TREE: 3122 size = sizeof(crush_bucket_tree); 3123 break; 3124 case CRUSH_BUCKET_STRAW: 3125 size = sizeof(crush_bucket_straw); 3126 break; 3127 case CRUSH_BUCKET_STRAW2: 3128 size = sizeof(crush_bucket_straw2); 3129 break; 3130 default: 3131 { 3132 char str[128]; 3133 snprintf(str, sizeof(str), "unsupported bucket algorithm: %d", alg); 3134 throw buffer::malformed_input(str); 3135 } 3136 } CRUSH_BUCKET_UNIFORM = 1 CRUSH_BUCKET_LIST = 2 CRUSH_BUCKET_TREE = 3 CRUSH_BUCKET_STRAW = 4 CRUSH_BUCKET_STRAW2 = 5 So valid values for bucket algorithms are 1 through 5 but, for whatever reason, at least one of yours is being interpreted as "-1" this doesn't seem like something that would just happen spontaneously with no changes to the cluster. What recent changes have you made to the osdmap? What recent changes have you made to the crushmap? Have you recently upgraded? > *** Caught signal (Aborted) ** > in thread 7f945ee04f00 thread_name:ceph-objectstor > ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic > (stable) > 1: (()+0xf5d0) [0x7f94531935d0] > 2: (gsignal()+0x37) [0x7f9451d80207] > 3: (abort()+0x148) [0x7f9451d818f8] > 4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f945268f7d5] > 5: (()+0x5e746) [0x7f945268d746] > 6: (()+0x5e773) [0x7f945268d773] > 7: (__cxa_rethrow()+0x49) [0x7f945268d9e9] > 8: (CrushWrapper::decode(ceph::buffer::list::iterator&)+0x18b8) > [0x7f94553218d8] > 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x4ad) [0x7f94550ff4ad] > 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f9455101db1] > 11: (get_osdmap(ObjectStore*, unsigned int, OSDMap&, > ceph::buffer::list&)+0x1d0) [0x55de1f9a6e60] > 12: (main()+0x5340) [0x55de1f8c8870] > 13: (__libc_start_main()+0xf5) [0x7f9451d6c3d5] > 14: (()+0x3adc10) [0x55de1f9a1c10] > Aborted > > -[~:#]- ceph-objectstore-tool --op get-osdmap --data-path > /var/lib/ceph/osd/ceph-46/ --file osdmap46 > terminate called after throwing an instance of > 'ceph::buffer::malformed_input' > what(): buffer::malformed_input: unsupported bucket algorithm: -1 > *** Caught signal (Aborted) ** > in thread 7f9ce4135f00 thread_name:ceph-objectstor > ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic > (stable) > 1: (()+0xf5d0) [0x7f9cd84c45d0] > 2: (gsignal()+0x37) [0x7f9cd70b1207] > 3: (abort()+0x148) [0x7f9cd70b28f8] > 4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f9cd79c07d5] > 5: (()+0x5e746) [0x7f9cd79be746] > 6: (()+0x5e773) [0x7f9cd79be773] > 7: (__cxa_rethrow()+0x49) [0x7f9cd79be9e9] > 8: (CrushWrapper::decode(ceph::buffer::list::iterator&)+0x18b8) > [0x7f9cda6528d8] > 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x4ad) [0x7f9cda4304ad] > 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f9cda432db1] > 11: (get_osdmap(ObjectStore*, unsigned int, OSDMap&, > ceph::buffer::list&)+0x1d0) [0x55cea26c8e60] > 12: (main()+0x5340) [0x55cea25ea870] > 13: (__libc_start_main()+0xf5) [0x7f9cd709d3d5] > 14: (()+0x3adc10) [0x55cea26c3c10] > Aborted > > -[~:#]- ceph osd getmap -o osdmap > got osdmap epoch 81298 > > -[~:#]- ceph-objectstore-tool --op set-osdmap --data-path > /var/lib/ceph/osd/ceph-46/ --file osdmap > osdmap (#-1:92f679f2:::osdmap.81298:0#) does not exist. > > -[~:#]- ceph-objectstore-tool --op set-osdmap --data-path > /var/lib/ceph/osd/ceph-45/ --file osdmap > osdmap (#-1:92f679f2:::osdmap.81298:0#) does not exist. > > > > On 8/14/19 2:54 AM, Paul Emmerich wrote: > > Starting point to debug/fix this would be to extract the osdmap from > > one of the dead OSDs: > > > > ceph-objectstore-tool --op get-osdmap --data-path /var/lib/ceph/osd/... > > > > Then try to run osdmaptool on that osdmap to see if it also crashes, > > set some --debug options (don't know which one off the top of my > > head). > > Does it also crash? How does it differ from the map retrieved with > > "ceph osd getmap"? > > > > You can also set the osdmap with "--op set-osdmap", does it help to > > set the osdmap retrieved by "ceph osd getmap"? > > > > Paul > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com