Re: 3 OSDs stopped and unable to restart

Brett Chancellor <bchancellor@xxxxxxxxxxxxxx> · Tue, 9 Jul 2019 13:33:20 -0400

What does bluestore_bluefs_gift_ratio do?  I can't find any documentation on it.  Also do you think this could be related to the .rgw.meta pool having too many objects per PG? The disks that die always seem to be backfilling a pg from that pool, and they have ~550k objects per PG.
-Brett

On Tue, Jul 9, 2019 at 1:03 PM Igor Fedotov <ifedotov@xxxxxxx> wrote:

    Please try to set bluestore_bluefs_gift_ratio to 0.0002

    On 7/9/2019 7:39 PM, Brett Chancellor
      wrote:

      Too large for pastebin.. The problem is continually
        crashing new OSDs. Here is the latest one.

        On Tue, Jul 9, 2019 at 11:46
          AM Igor Fedotov <ifedotov@xxxxxxx> wrote:

            could you please set debug bluestore to 20 and collect
              startup log for this specific OSD once again?

            On
              7/9/2019 6:29 PM, Brett Chancellor wrote:

              I restarted most of the OSDs with the
                stupid allocator (6 of them wouldn't start unless bitmap
                allocator was set), but I'm still seeing issues with
                OSDs crashing.  Interestingly it seems that the dying
                OSDs are always working on a pg from the .rgw.meta pool
                when they crash.

                Log : https://pastebin.com/yuJKcPvX

                On Tue, Jul 9, 2019 at
                  5:14 AM Igor Fedotov <ifedotov@xxxxxxx>
                  wrote:

                    Hi Brett,
                    in Nautilus you can do that via
                    ceph config set osd.N bluestore_allocator stupid
                    ceph config set osd.N bluefs_allocator stupid
                    See https://ceph.com/community/new-mimic-centralized-configuration-management/
                      for more details on a new way of configuration
                      options setting.

                    A known issue with Stupid allocator is gradual
                      write request latency increase (occurred within
                      several days after OSD restart). Seldom observed
                      though. There were some posts about that behavior
                      in the mail list  this year.

                    Thanks,
                    Igor.

                    On 7/8/2019 8:33 PM, Brett Chancellor wrote:

                      I'll give that a try.  Is it
                        something like...
                        ceph tell 'osd.*' bluestore_allocator
                          stupid

                          ceph tell 'osd.*' bluefs_allocator stupid

                        And should I expect any issues doing this?

                        On Mon, Jul 8,
                          2019 at 1:04 PM Igor Fedotov <ifedotov@xxxxxxx>
                          wrote:

                            I should read call stack more
                              carefully... It's not about lacking free
                              space - this is rather the bug from this
                              ticket:
                            http://tracker.ceph.com/issues/40080

                            You should upgrade to v14.2.2 (once it's
                              available) or temporarily switch to stupid
                              allocator as a workaround.

                            Thanks,
                            Igor

                            On
                              7/8/2019 8:00 PM, Igor Fedotov wrote:

                              Hi Brett,
                              looks like BlueStore is unable to
                                allocate additional space for BlueFS at
                                main device. It's either lacking free
                                space or it's too fragmented...
                              Would you share osd log, please?
                              Also please run "ceph-bluestore-tool
                                --path <substitute with
                                path-to-osd!!!> bluefs-bdev-sizes"
                                and share the output.

                              Thanks,
                              Igor
                              On
                                7/3/2019 9:59 PM, Brett Chancellor
                                wrote:

                                Hi All! Today I've had 3
                                  OSDs stop themselves and are unable to
                                  restart, all with the same error.
                                  These OSDs are all on different hosts.
                                  All are running 14.2.1

                                  I did try the following two
                                    commands
                                  - ceph-kvstore-tool bluestore-kv
                                    /var/lib/ceph/osd/ceph-80 list >
                                    keys
                                    ## This failed with the same
                                    error below
                                  - ceph-bluestore-tool --path
                                    /var/lib/ceph/osd/ceph-80 fsck
                                   ## After a couple of hours
                                    returned...
                                  2019-07-03 18:30:02.095
                                    7fe7c1c1ef00 -1
                                    bluestore(/var/lib/ceph/osd/ceph-80)
                                    fsck warning: legacy statfs record
                                    found, suggest to run store repair
                                    to get consistent statistic reports

                                    fsck success

                                    ## Error when trying to start
                                      one of the OSDs
                                       -12> 2019-07-03
                                        18:36:57.450 7f5e42366700 -1 ***
                                        Caught signal (Aborted) **

                                         in thread 7f5e42366700
                                        thread_name:rocksdb:low0

                                         ceph version 14.2.1
                                        (d555a9489eb35f84f2e1ef49b77e19da9d113972)
                                        nautilus (stable)

                                         1: (()+0xf5d0) [0x7f5e50bd75d0]

                                         2: (gsignal()+0x37)
                                        [0x7f5e4f9ce207]

                                         3: (abort()+0x148)
                                        [0x7f5e4f9cf8f8]

                                         4:
                                        (ceph::__ceph_assert_fail(char
                                        const*, char const*, int, char
                                        const*)+0x199) [0x55a7aaee96ab]

                                         5:
                                        (ceph::__ceph_assertf_fail(char
                                        const*, char const*, int, char
                                        const*, char const*, ...)+0)
                                        [0x55a7aaee982a]

                                         6: (interval_set<unsigned
                                        long, std::map<unsigned long,
                                        unsigned long,
                                        std::less<unsigned long>,
std::allocator<std::pair<unsigned long const, unsigned long>
                                        > > >::insert(unsigned
                                        long, unsigned long, unsigned
                                        long*, unsigned long*)+0x3c6)
                                        [0x55a7ab212a66]

                                         7:
                                        (BlueStore::allocate_bluefs_freespace(unsigned
                                        long, unsigned long,
                                        std::vector<bluestore_pextent_t,
mempool::pool_allocator<(mempool::pool_index_t)4,
                                        bluestore_pextent_t>
                                        >*)+0x74e) [0x55a7ab48253e]

                                         8:
                                        (BlueFS::_expand_slow_device(unsigned
                                        long,
                                        std::vector<bluestore_pextent_t,
mempool::pool_allocator<(mempool::pool_index_t)4,
                                        bluestore_pextent_t>
                                        >&)+0x111)
                                        [0x55a7ab59e921]

                                         9: (BlueFS::_allocate(unsigned
                                        char, unsigned long,
                                        bluefs_fnode_t*)+0x68b)
                                        [0x55a7ab59f68b]

                                         10:
                                        (BlueFS::_flush_range(BlueFS::FileWriter*,
                                        unsigned long, unsigned
                                        long)+0xe5) [0x55a7ab59fce5]

                                         11:
                                        (BlueFS::_flush(BlueFS::FileWriter*,
                                        bool)+0x10b) [0x55a7ab5a1b4b]

                                         12:
                                        (BlueRocksWritableFile::Flush()+0x3d)
                                        [0x55a7ab5bf84d]

                                         13:
                                        (rocksdb::WritableFileWriter::Flush()+0x19e)
                                        [0x55a7abbedd0e]

                                         14:
                                        (rocksdb::WritableFileWriter::Sync(bool)+0x2e)
                                        [0x55a7abbedfee]

                                         15:
                                        (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status
                                        const&,
                                        rocksdb::CompactionJob::SubcompactionState*,
                                        rocksdb::RangeDelAggregator*,
                                        CompactionIterationStats*,
                                        rocksdb::Slice const*)+0xbaa)
                                        [0x55a7abc3b73a]

                                         16:
(rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d0)
                                        [0x55a7abc3f150]

                                         17:
                                        (rocksdb::CompactionJob::Run()+0x298)
                                        [0x55a7abc40618]

                                         18:
                                        (rocksdb::DBImpl::BackgroundCompaction(bool*,
                                        rocksdb::JobContext*,
                                        rocksdb::LogBuffer*,
                                        rocksdb::DBImpl::PrepickedCompaction*)+0xcb7)
                                        [0x55a7aba7fb67]

                                         19:
(rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*,
                                        rocksdb::Env::Priority)+0xd0)
                                        [0x55a7aba813c0]

                                         20:
                                        (rocksdb::DBImpl::BGWorkCompaction(void*)+0x3a)
                                        [0x55a7aba8190a]

                                         21:
                                        (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned
                                        long)+0x264) [0x55a7abc8d9c4]

                                         22:
                                        (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x4f)
                                        [0x55a7abc8db4f]

                                         23: (()+0x129dfff)
                                        [0x55a7abd1afff]

                                         24: (()+0x7dd5)
                                        [0x7f5e50bcfdd5]

                                         25: (clone()+0x6d)
                                        [0x7f5e4fa95ead]

                                         NOTE: a copy of the executable,
                                        or `objdump -rdS
                                        <executable>` is needed to
                                        interpret this.

                                _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                              _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com