If 2.14 is part of a non-existent pool, you should be able to rename it out of current/ in the osd directory to prevent the osd from seeing it on startup. -Sam ----- Original Message ----- From: "Berant Lemmenes" <berant@xxxxxxxxxxxx> To: "Samuel Just" <sjust@xxxxxxxxxx> Cc: ceph-users@xxxxxxxxxxxxxx Sent: Tuesday, May 19, 2015 12:58:30 PM Subject: Re: OSD unable to start (giant -> hammer) Hello, So here are the steps I performed and where I sit now. Step 1) Using 'ceph-objectstore-tool list' to create a list of all PGs not associated with the 3 pools (rbd, data, metadata) that are actually in use on this cluster. Step 2) I then did a 'ceph-objectstore-tool remove' of those PGs Then when starting the OSD it would complain about PGs that were NOT in the list of 'ceph-objectstore-tool list' but WERE present on the filesystem of the OSD in question. Step 3) Iterating over all of the PGs that were on disk and using 'ceph-objectstore-tool info' I made a list of all PGs that returned ENOENT, Step 4) 'ceph-objectstore-tool remove' to remove all those as well. Now when starting osd.3 I get an "unable to load metadata' error for a PG that according to 'ceph pg 2.14 query' is not present (and shouldn't be) on osd.3. Shown below with OSD debugging at 20: <snip> -23> 2015-05-19 15:15:12.712036 7fb079a20780 20 read_log 39533'174051 (39533'174050) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2811937 2015-05-18 07:18:42.859501 -22> 2015-05-19 15:15:12.712066 7fb079a20780 20 read_log 39533'174052 (39533'174051) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2812374 2015-05-18 07:33:21.973157 -21> 2015-05-19 15:15:12.712096 7fb079a20780 20 read_log 39533'174053 (39533'174052) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2812861 2015-05-18 07:48:23.098343 -20> 2015-05-19 15:15:12.712127 7fb079a20780 20 read_log 39533'174054 (39533'174053) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2813371 2015-05-18 08:03:54.226512 -19> 2015-05-19 15:15:12.712157 7fb079a20780 20 read_log 39533'174055 (39533'174054) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2813922 2015-05-18 08:18:20.351421 -18> 2015-05-19 15:15:12.712187 7fb079a20780 20 read_log 39533'174056 (39533'174055) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2814396 2015-05-18 08:33:56.476035 -17> 2015-05-19 15:15:12.712221 7fb079a20780 20 read_log 39533'174057 (39533'174056) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2814971 2015-05-18 08:48:22.605674 -16> 2015-05-19 15:15:12.712252 7fb079a20780 20 read_log 39533'174058 (39533'174057) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2815407 2015-05-18 09:02:48.720181 -15> 2015-05-19 15:15:12.712282 7fb079a20780 20 read_log 39533'174059 (39533'174058) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2815434 2015-05-18 09:03:43.727839 -14> 2015-05-19 15:15:12.712312 7fb079a20780 20 read_log 39533'174060 (39533'174059) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2815889 2015-05-18 09:17:49.846406 -13> 2015-05-19 15:15:12.712342 7fb079a20780 20 read_log 39533'174061 (39533'174060) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2816358 2015-05-18 09:32:50.969457 -12> 2015-05-19 15:15:12.712372 7fb079a20780 20 read_log 39533'174062 (39533'174061) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2816840 2015-05-18 09:47:52.091524 -11> 2015-05-19 15:15:12.712403 7fb079a20780 20 read_log 39533'174063 (39533'174062) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2816861 2015-05-18 09:48:22.096309 -10> 2015-05-19 15:15:12.712433 7fb079a20780 20 read_log 39533'174064 (39533'174063) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by client.18119.0:2817714 2015-05-18 10:02:53.222749 -9> 2015-05-19 15:15:12.713130 7fb079a20780 10 read_log done -8> 2015-05-19 15:15:12.713550 7fb079a20780 10 osd.3 pg_epoch: 39533 pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=0 pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive] handle_loaded -7> 2015-05-19 15:15:12.713570 7fb079a20780 5 osd.3 pg_epoch: 39533 pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=0 pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] exit Initial 0.097986 0 0.000000 -6> 2015-05-19 15:15:12.713587 7fb079a20780 5 osd.3 pg_epoch: 39533 pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=0 pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] enter Reset -5> 2015-05-19 15:15:12.713601 7fb079a20780 20 osd.3 pg_epoch: 39533 pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=0 pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] set_last_peering_reset 39533 -4> 2015-05-19 15:15:12.713614 7fb079a20780 10 osd.3 pg_epoch: 39533 pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=39533 pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] Clearing blocked outgoing recovery messages -3> 2015-05-19 15:15:12.713629 7fb079a20780 10 osd.3 pg_epoch: 39533 pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=39533 pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] Not blocking outgoing recovery messages -2> 2015-05-19 15:15:12.713643 7fb079a20780 10 osd.3 39533 load_pgs loaded pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=39533 pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] log((37945'171063,39533'174064], crt=39533'174062) -1> 2015-05-19 15:15:12.713658 7fb079a20780 10 osd.3 39533 pgid 2.14 coll 2.14_head 0> 2015-05-19 15:15:12.716475 7fb079a20780 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, spg_t, ceph::bufferlist*)' thread 7fb079a20780 time 2015-05-19 15:15:12.715425 osd/PG.cc: 2860: FAILED assert(0 == "unable to open pg metadata") ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xb1784f] 2: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xb28) [0x793dd8] 3: (OSD::load_pgs()+0x147f) [0x683dff] 4: (OSD::init()+0x1448) [0x6930b8] 5: (main()+0x26b9) [0x62fd89] 6: (__libc_start_main()+0xed) [0x7fb07767876d] 7: ceph-osd() [0x635679] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 20/20 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) 99/99 (stderr threshold) max_recent 10000 max_new 1000 log_file --- end dump of recent events --- terminate called after throwing an instance of 'ceph::FailedAssertion' *** Caught signal (Aborted) ** in thread 7fb079a20780 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) 1: ceph-osd() [0xa1fe55] 2: (()+0xfcb0) [0x7fb078a60cb0] 3: (gsignal()+0x35) [0x7fb07768d0d5] 4: (abort()+0x17b) [0x7fb07769083b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb077fde69d] 6: (()+0xb5846) [0x7fb077fdc846] 7: (()+0xb5873) [0x7fb077fdc873] 8: (()+0xb596e) [0x7fb077fdc96e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x259) [0xb17a29] 10: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xb28) [0x793dd8] 11: (OSD::load_pgs()+0x147f) [0x683dff] 12: (OSD::init()+0x1448) [0x6930b8] 13: (main()+0x26b9) [0x62fd89] 14: (__libc_start_main()+0xed) [0x7fb07767876d] 15: ceph-osd() [0x635679] 2015-05-19 15:15:12.812704 7fb079a20780 -1 *** Caught signal (Aborted) ** in thread 7fb079a20780 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) 1: ceph-osd() [0xa1fe55] 2: (()+0xfcb0) [0x7fb078a60cb0] 3: (gsignal()+0x35) [0x7fb07768d0d5] 4: (abort()+0x17b) [0x7fb07769083b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb077fde69d] 6: (()+0xb5846) [0x7fb077fdc846] 7: (()+0xb5873) [0x7fb077fdc873] 8: (()+0xb596e) [0x7fb077fdc96e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x259) [0xb17a29] 10: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xb28) [0x793dd8] 11: (OSD::load_pgs()+0x147f) [0x683dff] 12: (OSD::init()+0x1448) [0x6930b8] 13: (main()+0x26b9) [0x62fd89] 14: (__libc_start_main()+0xed) [0x7fb07767876d] 15: ceph-osd() [0x635679] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2015-05-19 15:15:12.812704 7fb079a20780 -1 *** Caught signal (Aborted) ** in thread 7fb079a20780 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) 1: ceph-osd() [0xa1fe55] 2: (()+0xfcb0) [0x7fb078a60cb0] 3: (gsignal()+0x35) [0x7fb07768d0d5] 4: (abort()+0x17b) [0x7fb07769083b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb077fde69d] 6: (()+0xb5846) [0x7fb077fdc846] 7: (()+0xb5873) [0x7fb077fdc873] 8: (()+0xb596e) [0x7fb077fdc96e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x259) [0xb17a29] 10: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xb28) [0x793dd8] 11: (OSD::load_pgs()+0x147f) [0x683dff] 12: (OSD::init()+0x1448) [0x6930b8] 13: (main()+0x26b9) [0x62fd89] 14: (__libc_start_main()+0xed) [0x7fb07767876d] 15: ceph-osd() [0x635679] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 20/20 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) 99/99 (stderr threshold) max_recent 10000 max_new 1000 log_file --- end dump of recent events --- Here is the PG info for 2.14 ceph pg 2.14 query { "state": "active+undersized+degraded", "snap_trimq": "[]", "epoch": 39556, "up": [ 8, 7], "acting": [ 8, 7], "actingbackfill": [ "7", "8"], "info": { "pgid": "2.14", "last_update": "39533'175859", "last_complete": "39533'175859", "log_tail": "36964'172858", "last_user_version": 175859, "last_backfill": "MAX", "purged_snaps": "[]", "history": { "epoch_created": 1, "last_epoch_started": 39536, "last_epoch_clean": 39536, "last_epoch_split": 0, "same_up_since": 39534, "same_interval_since": 39534, "same_primary_since": 39527, "last_scrub": "39533'175859", "last_scrub_stamp": "2015-05-18 05:23:02.952523", "last_deep_scrub": "39533'175859", "last_deep_scrub_stamp": "2015-05-18 05:23:02.952523", "last_clean_scrub_stamp": "2015-05-18 05:23:02.952523"}, "stats": { "version": "39533'175859", "reported_seq": "281883", "reported_epoch": "39556", "state": "active+undersized+degraded", "last_fresh": "2015-05-19 06:41:09.002111", "last_change": "2015-05-18 10:19:22.277851", "last_active": "2015-05-19 06:41:09.002111", "last_clean": "2015-05-18 06:41:38.906417", "last_became_active": "2013-05-07 04:23:31.972742", "last_unstale": "2015-05-19 06:41:09.002111", "last_undegraded": "2015-05-18 10:18:37.449550", "last_fullsized": "2015-05-18 10:18:37.449550", "mapping_epoch": 39527, "log_start": "36964'172858", "ondisk_log_start": "36964'172858", "created": 1, "last_epoch_clean": 39536, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "39533'175859", "last_scrub_stamp": "2015-05-18 05:23:02.952523", "last_deep_scrub": "39533'175859", "last_deep_scrub_stamp": "2015-05-18 05:23:02.952523", "last_clean_scrub_stamp": "2015-05-18 05:23:02.952523", "log_size": 3001, "ondisk_log_size": 3001, "stats_invalid": "0", "stat_sum": { "num_bytes": 441982976, "num_objects": 106, "num_object_clones": 0, "num_object_copies": 318, "num_objects_missing_on_primary": 0, "num_objects_degraded": 106, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 11, "num_whiteouts": 0, "num_read": 61399, "num_read_kb": 1285319, "num_write": 135192, "num_write_kb": 2422029, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 79, "num_bytes_recovered": 329883648, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0}, "stat_cat_sum": {}, "up": [ 8, 7], "acting": [ 8, 7], "blocked_by": [], "up_primary": 8, "acting_primary": 8}, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 39536, "hit_set_history": { "current_last_update": "0'0", "current_last_stamp": "0.000000", "current_info": { "begin": "0.000000", "end": "0.000000", "version": "0'0"}, "history": []}}, "peer_info": [ { "peer": "7", "pgid": "2.14", "last_update": "39533'175859", "last_complete": "39533'175859", "log_tail": "36964'172858", "last_user_version": 175859, "last_backfill": "MAX", "purged_snaps": "[]", "history": { "epoch_created": 1, "last_epoch_started": 39536, "last_epoch_clean": 39536, "last_epoch_split": 0, "same_up_since": 39534, "same_interval_since": 39534, "same_primary_since": 39527, "last_scrub": "39533'175859", "last_scrub_stamp": "2015-05-18 05:23:02.952523", "last_deep_scrub": "39533'175859", "last_deep_scrub_stamp": "2015-05-18 05:23:02.952523", "last_clean_scrub_stamp": "2015-05-18 05:23:02.952523"}, "stats": { "version": "39533'175858", "reported_seq": "281598", "reported_epoch": "39533", "state": "active+clean", "last_fresh": "2015-05-13 21:58:43.553887", "last_change": "2015-05-12 22:50:16.011917", "last_active": "2015-05-13 21:58:43.553887", "last_clean": "2015-05-13 21:58:43.553887", "last_became_active": "2013-05-07 04:23:31.972742", "last_unstale": "2015-05-13 21:58:43.553887", "last_undegraded": "2015-05-13 21:58:43.553887", "last_fullsized": "2015-05-13 21:58:43.553887", "mapping_epoch": 39527, "log_start": "36964'172857", "ondisk_log_start": "36964'172857", "created": 1, "last_epoch_clean": 39529, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "39533'175857", "last_scrub_stamp": "2015-05-12 22:50:16.011867", "last_deep_scrub": "39533'175856", "last_deep_scrub_stamp": "2015-05-10 10:30:24.933431", "last_clean_scrub_stamp": "2015-05-12 22:50:16.011867", "log_size": 3001, "ondisk_log_size": 3001, "stats_invalid": "0", "stat_sum": { "num_bytes": 441982976, "num_objects": 106, "num_object_clones": 0, "num_object_copies": 315, "num_objects_missing_on_primary": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 11, "num_whiteouts": 0, "num_read": 61157, "num_read_kb": 1281187, "num_write": 135192, "num_write_kb": 2422029, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 79, "num_bytes_recovered": 329883648, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0}, "stat_cat_sum": {}, "up": [ 8, 7], "acting": [ 8, 7], "blocked_by": [], "up_primary": 8, "acting_primary": 8}, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 39536, "hit_set_history": { "current_last_update": "0'0", "current_last_stamp": "0.000000", "current_info": { "begin": "0.000000", "end": "0.000000", "version": "0'0"}, "history": []}}], "recovery_state": [ { "name": "Started\/Primary\/Active", "enter_time": "2015-05-18 10:18:37.449561", "might_have_unfound": [], "recovery_progress": { "backfill_targets": [], "waiting_on_backfill": [], "last_backfill_started": "0\/\/0\/\/-1", "backfill_info": { "begin": "0\/\/0\/\/-1", "end": "0\/\/0\/\/-1", "objects": []}, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "pull_from_peer": [], "pushing": []}}, "scrub": { "scrubber.epoch_start": "39527", "scrubber.active": 0, "scrubber.block_writes": 0, "scrubber.waiting_on": 0, "scrubber.waiting_on_whom": []}}, { "name": "Started", "enter_time": "2015-05-18 10:18:05.335040"}], "agent_state": {}} On Mon, May 18, 2015 at 2:34 PM, Berant Lemmenes <berant@xxxxxxxxxxxx> wrote: > Sam, > > Thanks for taking a look. It does seem to fit my issue. Would just > removing the 5.0_head directory be appropriate or would using > ceph-objectstore-tool be better? > > Thanks, > Berant > > On Mon, May 18, 2015 at 1:47 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: > >> You have most likely hit http://tracker.ceph.com/issues/11429. There >> are some workarounds in the bugs marked as duplicates of that bug, or you >> can wait for the next hammer point release. >> -Sam >> >> ----- Original Message ----- >> From: "Berant Lemmenes" <berant@xxxxxxxxxxxx> >> To: ceph-users@xxxxxxxxxxxxxx >> Sent: Monday, May 18, 2015 10:24:38 AM >> Subject: OSD unable to start (giant -> hammer) >> >> Hello all, >> >> I've encountered a problem when upgrading my single node home cluster >> from giant to hammer, and I would greatly appreciate any insight. >> >> I upgraded the packages like normal, then proceeded to restart the mon >> and once that came back restarted the first OSD (osd.3). However it >> subsequently won't start and crashes with the following failed assertion: >> >> >> >> osd/OSD.h: 716: FAILED assert(ret) >> >> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) >> >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x7f) [0xb1784f] >> >> 2: (OSD::load_pgs()+0x277b) [0x6850fb] >> >> 3: (OSD::init()+0x1448) [0x6930b8] >> >> 4: (main()+0x26b9) [0x62fd89] >> >> 5: (__libc_start_main()+0xed) [0x7f2345bc976d] >> >> 6: ceph-osd() [0x635679] >> >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >> to interpret this. >> >> >> >> >> --- logging levels --- >> >> 0/ 5 none >> >> 0/ 1 lockdep >> >> 0/ 1 context >> >> 1/ 1 crush >> >> 1/ 5 mds >> >> 1/ 5 mds_balancer >> >> 1/ 5 mds_locker >> >> 1/ 5 mds_log >> >> 1/ 5 mds_log_expire >> >> 1/ 5 mds_migrator >> >> 0/ 1 buffer >> >> 0/ 1 timer >> >> 0/ 1 filer >> >> 0/ 1 striper >> >> 0/ 1 objecter >> >> 0/ 5 rados >> >> 0/ 5 rbd >> >> 0/ 5 rbd_replay >> >> 0/ 5 journaler >> >> 0/ 5 objectcacher >> >> 0/ 5 client >> >> 0/ 5 osd >> >> 0/ 5 optracker >> >> 0/ 5 objclass >> >> 1/ 3 filestore >> >> 1/ 3 keyvaluestore >> >> 1/ 3 journal >> >> 0/ 5 ms >> >> 1/ 5 mon >> >> 0/10 monc >> >> 1/ 5 paxos >> >> 0/ 5 tp >> >> 1/ 5 auth >> >> 1/ 5 crypto >> >> 1/ 1 finisher >> >> 1/ 5 heartbeatmap >> >> 1/ 5 perfcounter >> >> 1/ 5 rgw >> >> 1/10 civetweb >> >> 1/ 5 javaclient >> >> 1/ 5 asok >> >> 1/ 1 throttle >> >> 0/ 0 refs >> >> 1/ 5 xio >> >> -2/-2 (syslog threshold) >> >> 99/99 (stderr threshold) >> >> max_recent 10000 >> >> max_new 1000 >> >> log_file >> >> --- end dump of recent events --- >> >> terminate called after throwing an instance of 'ceph::FailedAssertion' >> >> *** Caught signal (Aborted) ** >> >> in thread 7f2347f71780 >> >> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) >> >> 1: ceph-osd() [0xa1fe55] >> >> 2: (()+0xfcb0) [0x7f2346fb1cb0] >> >> 3: (gsignal()+0x35) [0x7f2345bde0d5] >> >> 4: (abort()+0x17b) [0x7f2345be183b] >> >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d] >> >> 6: (()+0xb5846) [0x7f234652d846] >> >> 7: (()+0xb5873) [0x7f234652d873] >> >> 8: (()+0xb596e) [0x7f234652d96e] >> >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x259) [0xb17a29] >> >> 10: (OSD::load_pgs()+0x277b) [0x6850fb] >> >> 11: (OSD::init()+0x1448) [0x6930b8] >> >> 12: (main()+0x26b9) [0x62fd89] >> >> 13: (__libc_start_main()+0xed) [0x7f2345bc976d] >> >> 14: ceph-osd() [0x635679] >> >> 2015-05-18 13:02:33.643064 7f2347f71780 -1 *** Caught signal (Aborted) ** >> >> in thread 7f2347f71780 >> >> >> >> >> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) >> >> 1: ceph-osd() [0xa1fe55] >> >> 2: (()+0xfcb0) [0x7f2346fb1cb0] >> >> 3: (gsignal()+0x35) [0x7f2345bde0d5] >> >> 4: (abort()+0x17b) [0x7f2345be183b] >> >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d] >> >> 6: (()+0xb5846) [0x7f234652d846] >> >> 7: (()+0xb5873) [0x7f234652d873] >> >> 8: (()+0xb596e) [0x7f234652d96e] >> >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x259) [0xb17a29] >> >> 10: (OSD::load_pgs()+0x277b) [0x6850fb] >> >> 11: (OSD::init()+0x1448) [0x6930b8] >> >> 12: (main()+0x26b9) [0x62fd89] >> >> 13: (__libc_start_main()+0xed) [0x7f2345bc976d] >> >> 14: ceph-osd() [0x635679] >> >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >> to interpret this. >> >> >> >> >> --- begin dump of recent events --- >> >> 0> 2015-05-18 13:02:33.643064 7f2347f71780 -1 *** Caught signal (Aborted) >> ** >> >> in thread 7f2347f71780 >> >> >> >> >> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) >> >> 1: ceph-osd() [0xa1fe55] >> >> 2: (()+0xfcb0) [0x7f2346fb1cb0] >> >> 3: (gsignal()+0x35) [0x7f2345bde0d5] >> >> 4: (abort()+0x17b) [0x7f2345be183b] >> >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d] >> >> 6: (()+0xb5846) [0x7f234652d846] >> >> 7: (()+0xb5873) [0x7f234652d873] >> >> 8: (()+0xb596e) [0x7f234652d96e] >> >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x259) [0xb17a29] >> >> 10: (OSD::load_pgs()+0x277b) [0x6850fb] >> >> 11: (OSD::init()+0x1448) [0x6930b8] >> >> 12: (main()+0x26b9) [0x62fd89] >> >> 13: (__libc_start_main()+0xed) [0x7f2345bc976d] >> >> 14: ceph-osd() [0x635679] >> >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >> to interpret this. >> >> >> >> >> --- logging levels --- >> >> 0/ 5 none >> >> 0/ 1 lockdep >> >> 0/ 1 context >> >> 1/ 1 crush >> >> 1/ 5 mds >> >> 1/ 5 mds_balancer >> >> 1/ 5 mds_locker >> >> 1/ 5 mds_log >> >> 1/ 5 mds_log_expire >> >> 1/ 5 mds_migrator >> >> 0/ 1 buffer >> >> 0/ 1 timer >> >> 0/ 1 filer >> >> 0/ 1 striper >> >> 0/ 1 objecter >> >> 0/ 5 rados >> >> 0/ 5 rbd >> >> 0/ 5 rbd_replay >> >> 0/ 5 journaler >> >> 0/ 5 objectcacher >> >> 0/ 5 client >> >> 0/ 5 osd >> >> 0/ 5 optracker >> >> 0/ 5 objclass >> >> 1/ 3 filestore >> >> 1/ 3 keyvaluestore >> >> 1/ 3 journal >> >> 0/ 5 ms >> >> 1/ 5 mon >> >> 0/10 monc >> >> 1/ 5 paxos >> >> 0/ 5 tp >> >> 1/ 5 auth >> >> 1/ 5 crypto >> >> 1/ 1 finisher >> >> 1/ 5 heartbeatmap >> >> 1/ 5 perfcounter >> >> 1/ 5 rgw >> >> 1/10 civetweb >> >> 1/ 5 javaclient >> >> 1/ 5 asok >> >> 1/ 1 throttle >> >> 0/ 0 refs >> >> 1/ 5 xio >> >> -2/-2 (syslog threshold) >> >> 99/99 (stderr threshold) >> >> max_recent 10000 >> >> max_new 1000 >> >> log_file >> >> --- end dump of recent events --- >> >> >> I've included a 'ceph osd dump' here: >> http://pastebin.com/RKbaY7nv >> >> ceph osd tree: >> >> >> ceph osd tree >> >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> >> -1 24.14000 root default >> >> -3 0 rack unknownrack >> >> -2 0 host ceph-test >> >> -4 24.14000 host ceph01 >> >> 0 1.50000 osd.0 down 0 1.00000 >> >> 2 1.50000 osd.2 down 0 1.00000 >> >> 3 1.50000 osd.3 down 1.00000 1.00000 >> >> 5 2.00000 osd.5 up 1.00000 1.00000 >> >> 6 2.00000 osd.6 up 1.00000 1.00000 >> >> 7 2.00000 osd.7 up 1.00000 1.00000 >> >> 8 2.00000 osd.8 up 1.00000 1.00000 >> >> 9 2.00000 osd.9 up 1.00000 1.00000 >> >> 10 2.00000 osd.10 up 1.00000 1.00000 >> >> 4 4.00000 osd.4 up 1.00000 1.00000 >> >> 1 3.64000 osd.1 up 1.00000 1.00000 >> >> >> >> >> Note that osd.0 and osd.2 were down prior to the upgrade and the cluster >> was healthy (these are failed disks that have been out for some time just >> not removed from CRUSH. >> >> I've also included a log with OSD debugging set to 20 here: >> >> https://dl.dropboxusercontent.com/u/1043493/osd.3.log.gz >> >> >> Looking through that file, it appears the last pg that it loads >> successfully is 2.3f6 then it moves to 5.0 >> >> -3> 2015-05-18 12:25:24.292091 7f6f407f9780 10 osd.3 39533 load_pgs >> loaded pg[2.3f6( v 39533'289849 (37945'286848,39533'289849] local-les=39532 >> n=99 ec=1 les/c 39532/39532 39531/39531/39523) [5,4,3] r=2 lpr=39533 >> pi=34961-39530/34 crt=39533'289846 lcod 0'0 inactive NOTIFY] >> log((37945'286848,39533'289849], crt=39533'289846) >> >> -2> 2015-05-18 12:25:24.292100 7f6f407f9780 10 osd.3 39533 pgid 5.0 coll >> 5.0_head >> >> -1> 2015-05-18 12:25:24.570188 7f6f407f9780 20 osd.3 0 get_map 34144 - >> loading and decoding 0x411fd80 >> >> 0> 2015-05-18 12:26:02.758914 7f6f407f9780 -1 osd/OSD.h: In function >> 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f6f407f9780 time >> 2015-05-18 12:25:24.620468 >> >> >> >> osd/OSD.h: 716: FAILED assert(ret) >> >> [snip] >> >> Which I don't see 5.0 in a pg dump. >> >> >> >> >> Thanks in advance! >> >> Berant >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com