Re: High memory usage kills OSD while peering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/24/2017 06:58 AM, Sage Weil wrote:

Okay, so I think the combination of (1) removing empty PGs and (2) pruning
past_intervals will help.  (1) can be scripted by looking in
current/$pg_HEAD directories and picking out the ones with 0 or 1 objects
in them, doing ceph-objecstore-tool export to make a backup (just in
case), and then removing them (with ceph-objectstore-tool).  Be careful of
PGs for empty pools since those will be naturally empty (and you want
to keep them).

For (2), see the wip-prune-past-intervals-jewel branch in ceph-ci.git.. if
that is applied to the kraken branch it ought ot work (although it's
untested).  Alternatively, you can just upgrade to luminous, as it
implements a more sophisticated version of the same thing.  You need to
upgrade mons, mark all osds down, upgrade osds and start at least one of
them, and then set 'ceph osd require-osd-release luminous' before it'll
switch to the new past intervals representation.  Definitely test it on
your test cluster to ensure it reduces the memory usage!

If that doesn't sort things out we'll need to see a heap profile for an
OOMing OSD to make sure we know what is using all of the RAM...

sage


well, big thank you sage. we tested the upgrade on the test cluster. and it did work in an awesome way. and we did not even needed to remove any pg with the tool. I read about the prune thing in the release notes, so when our attempts failed to start the cluster, we tried upgrading, but it did not help. it turned out that we missed the 'ceph osd require-osd-release luminous' thing. I mean we was looking on the command in the release notes upgrade section, and said to each other "it dose not matter, it would only restrict the old osds from joining" and moved on. damn, we would be up a week ago. having said that, I think the release notes should highlight this in the future.

now we have upgraded the production cluster, and it is up and running now, memory foot print was down to the tenth. the largest ram using osd i saw was about 6.5GB. but we faced some issues, particularly OSDs crashing with "FAILED assert(interval.last > last)"


logs:

-34> 2017-08-26 00:38:00.505114 7f14556b4d00 0 osd.299 1085665 load_pgs opened 455 pgs -33> 2017-08-26 00:38:00.505787 7f14556b4d00 10 osd.299 1085665 19.f1e needs 1050342-1085230 -32> 2017-08-26 00:38:00.505814 7f14556b4d00 1 osd.299 1085665 build_past_intervals_parallel over 1050342-1085230 -31> 2017-08-26 00:38:00.505818 7f14556b4d00 10 osd.299 1085665 build_past_intervals_parallel epoch 1050342 -30> 2017-08-26 00:38:00.505824 7f14556b4d00 20 osd.299 0 get_map 1050342 - loading and decoding 0x7f14b3dfb0c0 -29> 2017-08-26 00:38:00.506245 7f14556b4d00 10 osd.299 0 add_map_bl 1050342 780781 bytes -28> 2017-08-26 00:38:00.508539 7f14556b4d00 10 osd.299 1085665 build_past_intervals_parallel epoch 1050342 pg 19.f1e first map, acting [80] up [80], same_interval_since = 1050342 -27> 2017-08-26 00:38:00.508547 7f14556b4d00 10 osd.299 1085665 build_past_intervals_parallel epoch 1050343 -26> 2017-08-26 00:38:00.508550 7f14556b4d00 20 osd.299 0 get_map 1050343 - loading and decoding 0x7f14b3dfad80 -25> 2017-08-26 00:38:00.508997 7f14556b4d00 10 osd.299 0 add_map_bl 1050343 781371 bytes -24> 2017-08-26 00:38:00.511176 7f14556b4d00 10 osd.299 1085665 build_past_intervals_parallel epoch 1050344 -23> 2017-08-26 00:38:00.511196 7f14556b4d00 20 osd.299 0 get_map 1050344 - loading and decoding 0x7f14b3dfb740 -22> 2017-08-26 00:38:00.511625 7f14556b4d00 10 osd.299 0 add_map_bl 1050344 782446 bytes -21> 2017-08-26 00:38:00.513813 7f14556b4d00 10 osd.299 1085665 build_past_intervals_parallel epoch 1050345 -20> 2017-08-26 00:38:00.513820 7f14556b4d00 20 osd.299 0 get_map 1050345 - loading and decoding 0x7f14b3dfba80 -19> 2017-08-26 00:38:00.514260 7f14556b4d00 10 osd.299 0 add_map_bl 1050345 782071 bytes -18> 2017-08-26 00:38:00.516463 7f14556b4d00 10 osd.299 1085665 build_past_intervals_parallel epoch 1050346 -17> 2017-08-26 00:38:00.516488 7f14556b4d00 20 osd.299 0 get_map 1050346 - loading and decoding 0x7f14b79c4000 -16> 2017-08-26 00:38:00.516927 7f14556b4d00 10 osd.299 0 add_map_bl 1050346 781955 bytes -15> 2017-08-26 00:38:00.519047 7f14556b4d00 10 osd.299 1085665 build_past_intervals_parallel epoch 1050347 -14> 2017-08-26 00:38:00.519054 7f14556b4d00 20 osd.299 0 get_map 1050347 - loading and decoding 0x7f14b79c4340 -13> 2017-08-26 00:38:00.519500 7f14556b4d00 10 osd.299 0 add_map_bl 1050347 781930 bytes -12> 2017-08-26 00:38:00.521612 7f14556b4d00 10 osd.299 1085665 build_past_intervals_parallel epoch 1050348 -11> 2017-08-26 00:38:00.521619 7f14556b4d00 20 osd.299 0 get_map 1050348 - loading and decoding 0x7f14b79c4680 -10> 2017-08-26 00:38:00.522074 7f14556b4d00 10 osd.299 0 add_map_bl 1050348 784883 bytes -9> 2017-08-26 00:38:00.524245 7f14556b4d00 10 osd.299 1085665 build_past_intervals_parallel epoch 1050349 -8> 2017-08-26 00:38:00.524252 7f14556b4d00 20 osd.299 0 get_map 1050349 - loading and decoding 0x7f14b79c49c0 -7> 2017-08-26 00:38:00.524706 7f14556b4d00 10 osd.299 0 add_map_bl 1050349 785081 bytes -6> 2017-08-26 00:38:00.526854 7f14556b4d00 10 osd.299 1085665 build_past_intervals_parallel epoch 1050350 -5> 2017-08-26 00:38:00.526861 7f14556b4d00 20 osd.299 0 get_map 1050350 - loading and decoding 0x7f14b79c4d00 -4> 2017-08-26 00:38:00.527330 7f14556b4d00 10 osd.299 0 add_map_bl 1050350 785948 bytes -3> 2017-08-26 00:38:00.529505 7f14556b4d00 10 osd.299 1085665 build_past_intervals_parallel epoch 1050351 -2> 2017-08-26 00:38:00.529512 7f14556b4d00 20 osd.299 0 get_map 1050351 - loading and decoding 0x7f14b79c5040 -1> 2017-08-26 00:38:00.529979 7f14556b4d00 10 osd.299 0 add_map_bl 1050351 788650 bytes 0> 2017-08-26 00:38:00.534373 7f14556b4d00 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/osd/osd_types.cc: In function 'virtual void pi_compact_rep::add_interval(bool, const PastIntervals::pg_interval_t&)' thread 7f14556b4d00 time 2017-08-26 00:38:00.532119 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/osd/osd_types.cc: 3205: FAILED assert(interval.last > last)

ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f145612f420] 2: (pi_compact_rep::add_interval(bool, PastIntervals::pg_interval_t const&)+0x3b2) [0x7f1455e030b2] 3: (PastIntervals::check_new_interval(int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, unsigned int, unsigned int, std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, pg_t, IsPGRecoverablePredicate*, PastIntervals*, std::ostream*)+0x380) [0x7f1455de8ab0]
 4: (OSD::build_past_intervals_parallel()+0xa8f) [0x7f1455bbc71f]
 5: (OSD::load_pgs()+0x503) [0x7f1455bbef13]
 6: (OSD::init()+0x2179) [0x7f1455bd7779]
 7: (main()+0x2def) [0x7f1455add56f]
 8: (__libc_start_main()+0xf5) [0x7f1451d14b35]
 9: (()+0x4ac8a6) [0x7f1455b7b8a6]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
  20/20 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.299.log
--- end dump of recent events ---
2017-08-26 00:38:00.572479 7f14556b4d00 -1 *** Caught signal (Aborted) **
 in thread 7f14556b4d00 thread_name:ceph-osd

ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)
 1: (()+0xa21a01) [0x7f14560f0a01]
 2: (()+0xf370) [0x7f1452cfe370]
 3: (gsignal()+0x37) [0x7f1451d281d7]
 4: (abort()+0x148) [0x7f1451d298c8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x7f145612f594] 6: (pi_compact_rep::add_interval(bool, PastIntervals::pg_interval_t const&)+0x3b2) [0x7f1455e030b2] 7: (PastIntervals::check_new_interval(int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, unsigned int, unsigned int, std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, pg_t, IsPGRecoverablePredicate*, PastIntervals*, std::ostream*)+0x380) [0x7f1455de8ab0]
 8: (OSD::build_past_intervals_parallel()+0xa8f) [0x7f1455bbc71f]
 9: (OSD::load_pgs()+0x503) [0x7f1455bbef13]
 10: (OSD::init()+0x2179) [0x7f1455bd7779]
 11: (main()+0x2def) [0x7f1455add56f]
 12: (__libc_start_main()+0xf5) [0x7f1451d14b35]
 13: (()+0x4ac8a6) [0x7f1455b7b8a6]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
0> 2017-08-26 00:38:00.572479 7f14556b4d00 -1 *** Caught signal (Aborted) **
 in thread 7f14556b4d00 thread_name:ceph-osd

ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)
 1: (()+0xa21a01) [0x7f14560f0a01]
 2: (()+0xf370) [0x7f1452cfe370]
 3: (gsignal()+0x37) [0x7f1451d281d7]
 4: (abort()+0x148) [0x7f1451d298c8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x7f145612f594] 6: (pi_compact_rep::add_interval(bool, PastIntervals::pg_interval_t const&)+0x3b2) [0x7f1455e030b2] 7: (PastIntervals::check_new_interval(int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, unsigned int, unsigned int, std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, pg_t, IsPGRecoverablePredicate*, PastIntervals*, std::ostream*)+0x380) [0x7f1455de8ab0]
 8: (OSD::build_past_intervals_parallel()+0xa8f) [0x7f1455bbc71f]
 9: (OSD::load_pgs()+0x503) [0x7f1455bbef13]
 10: (OSD::init()+0x2179) [0x7f1455bd7779]
 11: (main()+0x2def) [0x7f1455add56f]
 12: (__libc_start_main()+0xf5) [0x7f1451d14b35]
 13: (()+0x4ac8a6) [0x7f1455b7b8a6]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
  20/20 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.299.log
--- end dump of recent events ---


ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-299/ --op info --pgid 19.f1e
{
    "pgid": "19.f1e",
    "last_update": "0'0",
    "last_complete": "0'0",
    "log_tail": "0'0",
    "last_user_version": 0,
    "last_backfill": "MAX",
    "last_backfill_bitwise": 0,
    "purged_snaps": [],
    "history": {
        "epoch_created": 1084817,
        "epoch_pool_created": 1084817,
        "last_epoch_started": 1085232,
        "last_interval_started": 1085230,
        "last_epoch_clean": 1050342,
        "last_interval_clean": 1050342,
        "last_epoch_split": 0,
        "last_epoch_marked_full": 1061015,
        "same_up_since": 1085230,
        "same_interval_since": 1085230,
        "same_primary_since": 1085230,
        "last_scrub": "960114'865853",
        "last_scrub_stamp": "2017-08-25 17:32:06.181006",
        "last_deep_scrub": "952725'861179",
        "last_deep_scrub_stamp": "2017-08-25 17:32:06.181006",
        "last_clean_scrub_stamp": "2017-08-25 17:32:06.181006"
    },
    "stats": {
        "version": "0'0",
        "reported_seq": "424",
        "reported_epoch": "1085650",
        "state": "active+undersized+degraded",
        "last_fresh": "2017-08-25 18:52:46.520078",
        "last_change": "2017-08-25 18:38:16.356266",
        "last_active": "2017-08-25 18:52:46.520078",
        "last_peered": "2017-08-25 18:52:46.520078",
        "last_clean": "2017-08-25 17:32:06.181006",
        "last_became_active": "2017-08-25 18:38:16.356266",
        "last_became_peered": "2017-08-25 18:38:16.356266",
        "last_unstale": "2017-08-25 18:52:46.520078",
        "last_undegraded": "2017-08-25 18:38:16.304877",
        "last_fullsized": "2017-08-25 18:38:16.304877",
        "mapping_epoch": 1085230,
        "log_start": "0'0",
        "ondisk_log_start": "0'0",
        "created": 1084817,
        "last_epoch_clean": 1050342,
        "parent": "0.0",
        "parent_split_bits": 0,
        "last_scrub": "960114'865853",
        "last_scrub_stamp": "2017-08-25 17:32:06.181006",
        "last_deep_scrub": "952725'861179",
        "last_deep_scrub_stamp": "2017-08-25 17:32:06.181006",
        "last_clean_scrub_stamp": "2017-08-25 17:32:06.181006",
        "log_size": 0,
        "ondisk_log_size": 0,
        "stats_invalid": false,
        "dirty_stats_invalid": false,
        "omap_stats_invalid": false,
        "hitset_stats_invalid": false,
        "hitset_bytes_stats_invalid": false,
        "pin_stats_invalid": false,
        "stat_sum": {
            "num_bytes": 0,
            "num_objects": 0,
            "num_object_clones": 0,
            "num_object_copies": 0,
            "num_objects_missing_on_primary": 0,
            "num_objects_missing": 0,
            "num_objects_degraded": 0,
            "num_objects_misplaced": 0,
            "num_objects_unfound": 0,
            "num_objects_dirty": 0,
            "num_whiteouts": 0,
            "num_read": 0,
            "num_read_kb": 0,
            "num_write": 0,
            "num_write_kb": 0,
            "num_scrub_errors": 0,
            "num_shallow_scrub_errors": 0,
            "num_deep_scrub_errors": 0,
            "num_objects_recovered": 0,
            "num_bytes_recovered": 0,
            "num_keys_recovered": 0,
            "num_objects_omap": 0,
            "num_objects_hit_set_archive": 0,
            "num_bytes_hit_set_archive": 0,
            "num_flush": 0,
            "num_flush_kb": 0,
            "num_evict": 0,
            "num_evict_kb": 0,
            "num_promote": 0,
            "num_flush_mode_high": 0,
            "num_flush_mode_low": 0,
            "num_evict_mode_some": 0,
            "num_evict_mode_full": 0,
            "num_objects_pinned": 0,
            "num_legacy_snapsets": 0
        },
        "up": [
            299
        ],
        "acting": [
            299
        ],
        "blocked_by": [],
        "up_primary": 299,
        "acting_primary": 299
    },
    "empty": 1,
    "dne": 0,
    "incomplete": 0,
    "last_epoch_started": 1085232,
    "hit_set_history": {
        "current_last_update": "0'0",
        "history": []
    }
}


ll /var/lib/ceph/osd/ceph-299/current/19.f1e_head/
total 0
-rw-r--r-- 1 root root 0 Aug 25 18:38 __head_00000F1E__13

ceph pg 19.f1e query
.
.
            "blocked": "peering is blocked due to down osds",
            "down_osds_we_would_probe": [
                299
            ],
            "peering_blocked_by": [
                {
                    "osd": 299,
                    "current_lost_at": 0,
"comment": "starting or marking this osd lost may let us proceed"
                }
            ]
.
.
.



removing the pg with ceph-objectstore-tool did the trick. but i am not sure if that will happen to a pg with real data in it or not.
should i report this in the bug tracker?

thanks
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux