Hi, we had a problem on our production cluster (running 9.2.1) which caused /proc, /dev and /sys to be unmounted. During this time, we received the following error on a large number of OSDs (for various osdmap epochs): Apr 15 15:25:19 kaa-99 ceph-osd[4167]: 2016-04-15 15:25:19.457774 7f1c817fd700 0 filestore(/local/ceph/osd.43) write couldn't open meta/-1/c188e154/osdmap.276293/0: (2) No such file or directory After restarting the hosts, the OSDs now refuse to start with: Apr 15 16:03:53 kaa-99 ceph-osd[4211]: -2> 2016-04-15 16:03:53.089842 7f8e9f840840 10 _load_class version success Apr 15 16:03:53 kaa-99 ceph-osd[4211]: -1> 2016-04-15 16:03:53.089863 7f8e9f840840 20 osd.43 0 get_map 276424 - loading and decoding 0x7f8e9b841780 Apr 15 16:03:53 kaa-99 ceph-osd[4211]: 0> 2016-04-15 16:03:53.140754 7f8e9f840840 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e9f840840 time 2016-04-15 16:03:53.139563 osd/OSD.h: 847: FAILED assert(ret) Inserting the map with ceph-objectstore-tool –op set-osdmap does not work and gives the following error: osdmap (-1/c1882e94/osdmap.276507/0) does not exist. 2016-04-15 17:14:00.335751 7f4b4d75b840 1 journal close /dev/ssd/journal.43 How can I get the OSDs running again? I also created an issue for this in the tracker: http://tracker.ceph.com/issues/15520 There are some similar entries, but I could not find a solution without recreating the OSD. Markus -- Markus Blank-Burian AK Heuer, Institut für Physikalische Chemie, WWU Münster Corrensstraße 28/30 Raum E005 Tel.: 0251 / 83 29178 E-Mail: blankburian@xxxxxx |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com