On 08/24/2017 06:58 AM, Sage Weil wrote:
Okay, so I think the combination of (1) removing empty PGs and (2) pruning
past_intervals will help. (1) can be scripted by looking in
current/$pg_HEAD directories and picking out the ones with 0 or 1 objects
in them, doing ceph-objecstore-tool export to make a backup (just in
case), and then removing them (with ceph-objectstore-tool). Be careful of
PGs for empty pools since those will be naturally empty (and you want
to keep them).
For (2), see the wip-prune-past-intervals-jewel branch in ceph-ci.git.. if
that is applied to the kraken branch it ought ot work (although it's
untested). Alternatively, you can just upgrade to luminous, as it
implements a more sophisticated version of the same thing. You need to
upgrade mons, mark all osds down, upgrade osds and start at least one of
them, and then set 'ceph osd require-osd-release luminous' before it'll
switch to the new past intervals representation. Definitely test it on
your test cluster to ensure it reduces the memory usage!
If that doesn't sort things out we'll need to see a heap profile for an
OOMing OSD to make sure we know what is using all of the RAM...
sage
well, big thank you sage. we tested the upgrade on the test cluster. and
it did work in an awesome way. and we did not even needed to remove any
pg with the tool.
I read about the prune thing in the release notes, so when our attempts
failed to start the cluster, we tried upgrading, but it did not help. it
turned out that we missed the 'ceph osd require-osd-release luminous'
thing. I mean we was looking on the command in the release notes upgrade
section, and said to each other "it dose not matter, it would only
restrict the old osds from joining" and moved on. damn, we would be up a
week ago.
having said that, I think the release notes should highlight this in the
future.
now we have upgraded the production cluster, and it is up and running
now, memory foot print was down to the tenth. the largest ram using osd
i saw was about 6.5GB.
but we faced some issues, particularly OSDs crashing with "FAILED
assert(interval.last > last)"
logs:
-34> 2017-08-26 00:38:00.505114 7f14556b4d00 0 osd.299 1085665
load_pgs opened 455 pgs
-33> 2017-08-26 00:38:00.505787 7f14556b4d00 10 osd.299 1085665
19.f1e needs 1050342-1085230
-32> 2017-08-26 00:38:00.505814 7f14556b4d00 1 osd.299 1085665
build_past_intervals_parallel over 1050342-1085230
-31> 2017-08-26 00:38:00.505818 7f14556b4d00 10 osd.299 1085665
build_past_intervals_parallel epoch 1050342
-30> 2017-08-26 00:38:00.505824 7f14556b4d00 20 osd.299 0 get_map
1050342 - loading and decoding 0x7f14b3dfb0c0
-29> 2017-08-26 00:38:00.506245 7f14556b4d00 10 osd.299 0 add_map_bl
1050342 780781 bytes
-28> 2017-08-26 00:38:00.508539 7f14556b4d00 10 osd.299 1085665
build_past_intervals_parallel epoch 1050342 pg 19.f1e first map, acting
[80] up [80], same_interval_since = 1050342
-27> 2017-08-26 00:38:00.508547 7f14556b4d00 10 osd.299 1085665
build_past_intervals_parallel epoch 1050343
-26> 2017-08-26 00:38:00.508550 7f14556b4d00 20 osd.299 0 get_map
1050343 - loading and decoding 0x7f14b3dfad80
-25> 2017-08-26 00:38:00.508997 7f14556b4d00 10 osd.299 0 add_map_bl
1050343 781371 bytes
-24> 2017-08-26 00:38:00.511176 7f14556b4d00 10 osd.299 1085665
build_past_intervals_parallel epoch 1050344
-23> 2017-08-26 00:38:00.511196 7f14556b4d00 20 osd.299 0 get_map
1050344 - loading and decoding 0x7f14b3dfb740
-22> 2017-08-26 00:38:00.511625 7f14556b4d00 10 osd.299 0 add_map_bl
1050344 782446 bytes
-21> 2017-08-26 00:38:00.513813 7f14556b4d00 10 osd.299 1085665
build_past_intervals_parallel epoch 1050345
-20> 2017-08-26 00:38:00.513820 7f14556b4d00 20 osd.299 0 get_map
1050345 - loading and decoding 0x7f14b3dfba80
-19> 2017-08-26 00:38:00.514260 7f14556b4d00 10 osd.299 0 add_map_bl
1050345 782071 bytes
-18> 2017-08-26 00:38:00.516463 7f14556b4d00 10 osd.299 1085665
build_past_intervals_parallel epoch 1050346
-17> 2017-08-26 00:38:00.516488 7f14556b4d00 20 osd.299 0 get_map
1050346 - loading and decoding 0x7f14b79c4000
-16> 2017-08-26 00:38:00.516927 7f14556b4d00 10 osd.299 0 add_map_bl
1050346 781955 bytes
-15> 2017-08-26 00:38:00.519047 7f14556b4d00 10 osd.299 1085665
build_past_intervals_parallel epoch 1050347
-14> 2017-08-26 00:38:00.519054 7f14556b4d00 20 osd.299 0 get_map
1050347 - loading and decoding 0x7f14b79c4340
-13> 2017-08-26 00:38:00.519500 7f14556b4d00 10 osd.299 0 add_map_bl
1050347 781930 bytes
-12> 2017-08-26 00:38:00.521612 7f14556b4d00 10 osd.299 1085665
build_past_intervals_parallel epoch 1050348
-11> 2017-08-26 00:38:00.521619 7f14556b4d00 20 osd.299 0 get_map
1050348 - loading and decoding 0x7f14b79c4680
-10> 2017-08-26 00:38:00.522074 7f14556b4d00 10 osd.299 0 add_map_bl
1050348 784883 bytes
-9> 2017-08-26 00:38:00.524245 7f14556b4d00 10 osd.299 1085665
build_past_intervals_parallel epoch 1050349
-8> 2017-08-26 00:38:00.524252 7f14556b4d00 20 osd.299 0 get_map
1050349 - loading and decoding 0x7f14b79c49c0
-7> 2017-08-26 00:38:00.524706 7f14556b4d00 10 osd.299 0 add_map_bl
1050349 785081 bytes
-6> 2017-08-26 00:38:00.526854 7f14556b4d00 10 osd.299 1085665
build_past_intervals_parallel epoch 1050350
-5> 2017-08-26 00:38:00.526861 7f14556b4d00 20 osd.299 0 get_map
1050350 - loading and decoding 0x7f14b79c4d00
-4> 2017-08-26 00:38:00.527330 7f14556b4d00 10 osd.299 0 add_map_bl
1050350 785948 bytes
-3> 2017-08-26 00:38:00.529505 7f14556b4d00 10 osd.299 1085665
build_past_intervals_parallel epoch 1050351
-2> 2017-08-26 00:38:00.529512 7f14556b4d00 20 osd.299 0 get_map
1050351 - loading and decoding 0x7f14b79c5040
-1> 2017-08-26 00:38:00.529979 7f14556b4d00 10 osd.299 0 add_map_bl
1050351 788650 bytes
0> 2017-08-26 00:38:00.534373 7f14556b4d00 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/osd/osd_types.cc:
In function 'virtual void pi_compact_rep::add_interval(bool, const
PastIntervals::pg_interval_t&)' thread 7f14556b4d00 time 2017-08-26
00:38:00.532119
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/osd/osd_types.cc:
3205: FAILED assert(interval.last > last)
ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4)
luminous (rc)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x110) [0x7f145612f420]
2: (pi_compact_rep::add_interval(bool, PastIntervals::pg_interval_t
const&)+0x3b2) [0x7f1455e030b2]
3: (PastIntervals::check_new_interval(int, int, std::vector<int,
std::allocator<int> > const&, std::vector<int, std::allocator<int> >
const&, int, int, std::vector<int, std::allocator<int> > const&,
std::vector<int, std::allocator<int> > const&, unsigned int, unsigned
int, std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, pg_t,
IsPGRecoverablePredicate*, PastIntervals*, std::ostream*)+0x380)
[0x7f1455de8ab0]
4: (OSD::build_past_intervals_parallel()+0xa8f) [0x7f1455bbc71f]
5: (OSD::load_pgs()+0x503) [0x7f1455bbef13]
6: (OSD::init()+0x2179) [0x7f1455bd7779]
7: (main()+0x2def) [0x7f1455add56f]
8: (__libc_start_main()+0xf5) [0x7f1451d14b35]
9: (()+0x4ac8a6) [0x7f1455b7b8a6]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
20/20 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 kinetic
1/ 5 fuse
1/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.299.log
--- end dump of recent events ---
2017-08-26 00:38:00.572479 7f14556b4d00 -1 *** Caught signal (Aborted) **
in thread 7f14556b4d00 thread_name:ceph-osd
ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4)
luminous (rc)
1: (()+0xa21a01) [0x7f14560f0a01]
2: (()+0xf370) [0x7f1452cfe370]
3: (gsignal()+0x37) [0x7f1451d281d7]
4: (abort()+0x148) [0x7f1451d298c8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x284) [0x7f145612f594]
6: (pi_compact_rep::add_interval(bool, PastIntervals::pg_interval_t
const&)+0x3b2) [0x7f1455e030b2]
7: (PastIntervals::check_new_interval(int, int, std::vector<int,
std::allocator<int> > const&, std::vector<int, std::allocator<int> >
const&, int, int, std::vector<int, std::allocator<int> > const&,
std::vector<int, std::allocator<int> > const&, unsigned int, unsigned
int, std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, pg_t,
IsPGRecoverablePredicate*, PastIntervals*, std::ostream*)+0x380)
[0x7f1455de8ab0]
8: (OSD::build_past_intervals_parallel()+0xa8f) [0x7f1455bbc71f]
9: (OSD::load_pgs()+0x503) [0x7f1455bbef13]
10: (OSD::init()+0x2179) [0x7f1455bd7779]
11: (main()+0x2def) [0x7f1455add56f]
12: (__libc_start_main()+0xf5) [0x7f1451d14b35]
13: (()+0x4ac8a6) [0x7f1455b7b8a6]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- begin dump of recent events ---
0> 2017-08-26 00:38:00.572479 7f14556b4d00 -1 *** Caught signal
(Aborted) **
in thread 7f14556b4d00 thread_name:ceph-osd
ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4)
luminous (rc)
1: (()+0xa21a01) [0x7f14560f0a01]
2: (()+0xf370) [0x7f1452cfe370]
3: (gsignal()+0x37) [0x7f1451d281d7]
4: (abort()+0x148) [0x7f1451d298c8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x284) [0x7f145612f594]
6: (pi_compact_rep::add_interval(bool, PastIntervals::pg_interval_t
const&)+0x3b2) [0x7f1455e030b2]
7: (PastIntervals::check_new_interval(int, int, std::vector<int,
std::allocator<int> > const&, std::vector<int, std::allocator<int> >
const&, int, int, std::vector<int, std::allocator<int> > const&,
std::vector<int, std::allocator<int> > const&, unsigned int, unsigned
int, std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, pg_t,
IsPGRecoverablePredicate*, PastIntervals*, std::ostream*)+0x380)
[0x7f1455de8ab0]
8: (OSD::build_past_intervals_parallel()+0xa8f) [0x7f1455bbc71f]
9: (OSD::load_pgs()+0x503) [0x7f1455bbef13]
10: (OSD::init()+0x2179) [0x7f1455bd7779]
11: (main()+0x2def) [0x7f1455add56f]
12: (__libc_start_main()+0xf5) [0x7f1451d14b35]
13: (()+0x4ac8a6) [0x7f1455b7b8a6]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
20/20 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 kinetic
1/ 5 fuse
1/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.299.log
--- end dump of recent events ---
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-299/ --op info
--pgid 19.f1e
{
"pgid": "19.f1e",
"last_update": "0'0",
"last_complete": "0'0",
"log_tail": "0'0",
"last_user_version": 0,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": [],
"history": {
"epoch_created": 1084817,
"epoch_pool_created": 1084817,
"last_epoch_started": 1085232,
"last_interval_started": 1085230,
"last_epoch_clean": 1050342,
"last_interval_clean": 1050342,
"last_epoch_split": 0,
"last_epoch_marked_full": 1061015,
"same_up_since": 1085230,
"same_interval_since": 1085230,
"same_primary_since": 1085230,
"last_scrub": "960114'865853",
"last_scrub_stamp": "2017-08-25 17:32:06.181006",
"last_deep_scrub": "952725'861179",
"last_deep_scrub_stamp": "2017-08-25 17:32:06.181006",
"last_clean_scrub_stamp": "2017-08-25 17:32:06.181006"
},
"stats": {
"version": "0'0",
"reported_seq": "424",
"reported_epoch": "1085650",
"state": "active+undersized+degraded",
"last_fresh": "2017-08-25 18:52:46.520078",
"last_change": "2017-08-25 18:38:16.356266",
"last_active": "2017-08-25 18:52:46.520078",
"last_peered": "2017-08-25 18:52:46.520078",
"last_clean": "2017-08-25 17:32:06.181006",
"last_became_active": "2017-08-25 18:38:16.356266",
"last_became_peered": "2017-08-25 18:38:16.356266",
"last_unstale": "2017-08-25 18:52:46.520078",
"last_undegraded": "2017-08-25 18:38:16.304877",
"last_fullsized": "2017-08-25 18:38:16.304877",
"mapping_epoch": 1085230,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 1084817,
"last_epoch_clean": 1050342,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "960114'865853",
"last_scrub_stamp": "2017-08-25 17:32:06.181006",
"last_deep_scrub": "952725'861179",
"last_deep_scrub_stamp": "2017-08-25 17:32:06.181006",
"last_clean_scrub_stamp": "2017-08-25 17:32:06.181006",
"log_size": 0,
"ondisk_log_size": 0,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"stat_sum": {
"num_bytes": 0,
"num_objects": 0,
"num_object_clones": 0,
"num_object_copies": 0,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 0,
"num_whiteouts": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 0,
"num_write_kb": 0,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0
},
"up": [
299
],
"acting": [
299
],
"blocked_by": [],
"up_primary": 299,
"acting_primary": 299
},
"empty": 1,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 1085232,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
}
ll /var/lib/ceph/osd/ceph-299/current/19.f1e_head/
total 0
-rw-r--r-- 1 root root 0 Aug 25 18:38 __head_00000F1E__13
ceph pg 19.f1e query
.
.
"blocked": "peering is blocked due to down osds",
"down_osds_we_would_probe": [
299
],
"peering_blocked_by": [
{
"osd": 299,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may
let us proceed"
}
]
.
.
.
removing the pg with ceph-objectstore-tool did the trick. but i am not
sure if that will happen to a pg with real data in it or not.
should i report this in the bug tracker?
thanks
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html