Re: High memory usage kills OSD while peering

Linux Chips <linux.chips@xxxxxxxxx> · Sun, 27 Aug 2017 00:17:36 +0300

Hi again,
now every thing almost sorted out. we had a few inconsistent shards that 
were killing the OSDs when recovering, we fixed some of them by removing 
the bad shards, and some by starting other OSDs with good shards.
what is stopping us now, is that one OSD had a corrupted leveldb and 
refuses to start.
not sure how that hapened, but i asume is due to the many times the 
node/osd died from lack of memory.
I am also not sure if we should continue the discussion here, or start a 
new thread.

the osd (262) is showing those logs upon start:

2017-08-26 17:07:17.915861 7fbd8e4cbd00  0 set uid:gid to 0:0 (:)
2017-08-26 17:07:17.915875 7fbd8e4cbd00  0 ceph version 12.1.4 
(a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc), process 
(unknown), pid 26713
2017-08-26 17:07:17.927085 7fbd8e4cbd00  0 pidfile_write: ignore empty 
--pid-file
2017-08-26 17:07:17.951358 7fbd8e4cbd00  0 load: jerasure load: lrc 
load: isa
2017-08-26 17:07:17.951602 7fbd8e4cbd00  0 
filestore(/var/lib/ceph/osd/ceph-262) backend xfs (magic 0x58465342)
2017-08-26 17:07:17.952164 7fbd8e4cbd00  0 
filestore(/var/lib/ceph/osd/ceph-262) backend xfs (magic 0x58465342)
2017-08-26 17:07:17.952977 7fbd8e4cbd00  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-262) detect_features: 
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2017-08-26 17:07:17.952983 7fbd8e4cbd00  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-262) detect_features: 
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2017-08-26 17:07:17.952985 7fbd8e4cbd00  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-262) detect_features: 
splice() is disabled via 'filestore splice' config option
2017-08-26 17:07:17.953309 7fbd8e4cbd00  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-262) detect_features: 
syncfs(2) syscall fully supported (by glibc and kernel)
2017-08-26 17:07:17.953797 7fbd8e4cbd00  0 
xfsfilestorebackend(/var/lib/ceph/osd/ceph-262) detect_feature: extsize 
is disabled by conf
2017-08-26 17:07:17.954628 7fbd8e4cbd00  0 
filestore(/var/lib/ceph/osd/ceph-262) start omap initiation
2017-08-26 17:07:17.957166 7fbd8e4cbd00 -1 
filestore(/var/lib/ceph/osd/ceph-262) mount(1724): Error initializing 
leveldb : Corruption: error in middle of record

2017-08-26 17:07:17.957179 7fbd8e4cbd00 -1 osd.262 0 OSD:init: unable to 
mount object store
2017-08-26 17:07:17.957183 7fbd8e4cbd00 -1  ** ERROR: osd init failed: 
(1) Operation not permitted

ceph-objectstore-tool shows similar errors.

so, we figured it is only one OSD and we can go without it. we marked it 
lost, pgs started to peer and got active. but 5 remain in the incomplete 
state. and te pg query shows:

...
    "recovery_state": [
        {
            "name": "Started/Primary/Peering/Incomplete",
            "enter_time": "2017-08-26 22:59:03.044623",
            "comment": "not enough complete instances of this PG"
        },
        {
            "name": "Started/Primary/Peering",
            "enter_time": "2017-08-26 22:59:02.540748",
            "past_intervals": [
                {
                    "first": "959669",
                    "last": "1090812",
                    "all_participants": [
                        {
                            "osd": 258
                        },
                        {
                            "osd": 262
                        },
                        {
                            "osd": 338
                        },
                        {
                            "osd": 545
                        },
                        {
                            "osd": 549
                        }
                    ],
                    "intervals": [
                        {
                            "first": "964880",
                            "last": "964924",
                            "acting": "262"
                        },
                        {
                            "first": "978855",
                            "last": "978956",
                            "acting": "545"
                        },
                        {
                            "first": "989628",
                            "last": "989808",
                            "acting": "258"
                        },
                        {
                            "first": "992614",
                            "last": "992975",
                            "acting": "549"
                        },
                        {
                            "first": "1085148",
                            "last": "1090812",
                            "acting": "338"
                        }
                    ]
                }
            ],
            "probing_osds": [
                "258",
                "338",
                "545",
                "549"
            ],
            "down_osds_we_would_probe": [
                262
            ],
            "peering_blocked_by": [],
            "peering_blocked_by_detail": [
                {
                    "detail": "peering_blocked_by_history_les_bound"
                }
            ]
        },
...

not sure wat that detail "peering_blocked_by_history_les_bound" is, and 
not sure how to proceed. i googled it, came up with nothing useful.
all the incomplete pgs have the same detail as the above and similar 
recovery state.

ceph pg ls | grep incomplete
18.54b         0                  0        0         0       0 
 0                 2739                 2739 
                             incomplete 2017-08-26 23:15:46.705071 
  46889'4277      1091150:314001 
                 [332,253]        332 
                                        [332,253]            332 
46889'4277 2017-08-04 03:15:58.381025        46889'4277 2017-07-29 
06:47:30.337673 

19.54a      5950                  0        0         0       0 
26108435266                 3019                 3019 
                                     incomplete 2017-08-26 
23:15:46.705156     961411'873129    1091150:58116482 
                                     [332,253]        332 
                                                            [332,253] 
         332     960118'872495 2017-08-04 03:12:33.647414 
952850'868978 2017-07-02 15:53:08.565948
19.608         0                  0        0         0       0 
 0                    0                    0 
                             incomplete 2017-08-26 22:59:03.044649 
         0'0         1091150:428 
                 [258,338]        258 
                                        [258,338]            258 
960118'862299 2017-08-04 03:01:57.011411     958900'861456 2017-07-28 
02:33:29.476119
19.8bb         0                  0        0         0       0 
 0                    0                    0 
                             incomplete 2017-08-26 22:59:02.946453 
         0'0         1091150:339 
                 [260,331]        260 
                                        [260,331]            260 
960114'866811 2017-08-03 04:51:42.117840     952850'864443 2017-07-08 
02:48:37.958357
19.dd3      5864                  0        0         0       0 
25600089555                 3094                 3094 
                                     incomplete 2017-08-26 
17:20:07.948285     961411'865657    1091150:72381143 
                                     [263,142]        263 
                                                            [263,142] 
         263     960118'865078 2017-08-25 17:32:06.181006 
960118'865078 2017-08-25 17:32:06.181006

I also noticed that some of those have 0 objects in them despite the dir 
in one of the osds have objects in it.
these pools are replica 2

thanks
ali
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html