Re: 3 OSDs stopped and unable to restart

Igor Fedotov <ifedotov@xxxxxxx> · Tue, 9 Jul 2019 20:38:25 +0300



    This will cap single bluefs space allocation. Currently it
      attempts to allocate 70Gb which seems to overflow some 32-bit
      length fields. With the adjustment such allocation should be
      capped at ~700MB.
    I doubt there is any relation between this specific failure and
      the pool. At least at the moment. 

    
    In short the history is: starting OSD tries to flush bluefs data
      to disk, detects lack of space and asks for more from main device
      - allocations succeeds but returned extent has length field set to
      0. 

    
    On 7/9/2019 8:33 PM, Brett Chancellor
      wrote:

    
      What does bluestore_bluefs_gift_ratio do?  I can't
        find any documentation on it.  Also do you think this could be
        related to the .rgw.meta pool having too many objects per PG?
        The disks that die always seem to be backfilling a pg from that
        pool, and they have ~550k objects per PG.
        

        -Brett
      
      
        On Tue, Jul 9, 2019 at 1:03 PM
          Igor Fedotov <ifedotov@xxxxxxx> wrote:

        
            Please try to set bluestore_bluefs_gift_ratio to 0.0002
            

            On
              7/9/2019 7:39 PM, Brett Chancellor wrote:

            
              Too large for pastebin.. The problem is
                continually crashing new OSDs. Here is the latest one.
              

                On Tue, Jul 9, 2019 at
                  11:46 AM Igor Fedotov <ifedotov@xxxxxxx>
                  wrote:

                
                    could you please set debug bluestore to 20 and
                      collect startup log for this specific OSD once
                      again?
                    

                    On
                      7/9/2019 6:29 PM, Brett Chancellor wrote:

                    
                      I restarted most of the OSDs with
                        the stupid allocator (6 of them wouldn't start
                        unless bitmap allocator was set), but I'm still
                        seeing issues with OSDs crashing.  Interestingly
                        it seems that the dying OSDs are always working
                        on a pg from the .rgw.meta pool when they crash.
                        

                        Log : https://pastebin.com/yuJKcPvX
                      
                      
                        On Tue, Jul 9,
                          2019 at 5:14 AM Igor Fedotov <ifedotov@xxxxxxx>
                          wrote:

                        
                            Hi Brett,
                            in Nautilus you can do that via
                            ceph config set osd.N bluestore_allocator
                              stupid
                            ceph config set osd.N bluefs_allocator
                              stupid
                            See https://ceph.com/community/new-mimic-centralized-configuration-management/
                              for more details on a new way of
                              configuration options setting.
                            

                            A known issue with Stupid allocator is
                              gradual write request latency increase
                              (occurred within several days after OSD
                              restart). Seldom observed though. There
                              were some posts about that behavior in the
                              mail list  this year.

                            
                            Thanks,
                            Igor.

                            
                            On 7/8/2019 8:33 PM, Brett Chancellor
                              wrote:
                            

                              I'll give that a try.  Is
                                it something like...
                                ceph tell 'osd.*'
                                  bluestore_allocator stupid
                                
                                  ceph tell 'osd.*'
                                    bluefs_allocator stupid
                                  

                                And should I expect any issues
                                  doing this?
                                

                                On
                                  Mon, Jul 8, 2019 at 1:04 PM Igor
                                  Fedotov <ifedotov@xxxxxxx>
                                  wrote:

                                
                                    I should read call stack more
                                      carefully... It's not about
                                      lacking free space - this is
                                      rather the bug from this ticket:
                                    http://tracker.ceph.com/issues/40080
                                    

                                    You should upgrade to v14.2.2
                                      (once it's available) or
                                      temporarily switch to stupid
                                      allocator as a workaround.
                                    

                                    Thanks,
                                    Igor

                                    
                                    On
                                      7/8/2019 8:00 PM, Igor Fedotov
                                      wrote:

                                    
                                      Hi Brett,
                                      looks like BlueStore is unable
                                        to allocate additional space for
                                        BlueFS at main device. It's
                                        either lacking free space or
                                        it's too fragmented...
                                      Would you share osd log,
                                        please?
                                      Also please run
                                        "ceph-bluestore-tool --path
                                        <substitute with
                                        path-to-osd!!!>
                                        bluefs-bdev-sizes" and share the
                                        output.

                                      
                                      Thanks,
                                      Igor
                                      On
                                        7/3/2019 9:59 PM, Brett
                                        Chancellor wrote:

                                      
                                        Hi All! Today
                                          I've had 3 OSDs stop
                                          themselves and are unable to
                                          restart, all with the same
                                          error. These OSDs are all on
                                          different hosts. All are
                                          running 14.2.1
                                          

                                          I did try the following
                                            two commands
                                          - ceph-kvstore-tool
                                            bluestore-kv
                                            /var/lib/ceph/osd/ceph-80
                                            list > keys
                                            ## This failed with the
                                            same error below
                                          - ceph-bluestore-tool
                                            --path
                                            /var/lib/ceph/osd/ceph-80
                                            fsck
                                           ## After a couple of
                                            hours returned...
                                          2019-07-03 18:30:02.095
                                            7fe7c1c1ef00 -1
                                            bluestore(/var/lib/ceph/osd/ceph-80)
                                            fsck warning: legacy statfs
                                            record found, suggest to run
                                            store repair to get
                                            consistent statistic reports

                                            fsck success

                                            
                                            ## Error when trying to
                                              start one of the OSDs
                                               -12>
                                                2019-07-03 18:36:57.450
                                                7f5e42366700 -1 ***
                                                Caught signal (Aborted)
                                                **

                                                 in thread 7f5e42366700
                                                thread_name:rocksdb:low0

                                                
                                                 ceph version 14.2.1
                                                (d555a9489eb35f84f2e1ef49b77e19da9d113972)
                                                nautilus (stable)

                                                 1: (()+0xf5d0)
                                                [0x7f5e50bd75d0]

                                                 2: (gsignal()+0x37)
                                                [0x7f5e4f9ce207]

                                                 3: (abort()+0x148)
                                                [0x7f5e4f9cf8f8]

                                                 4:
                                                (ceph::__ceph_assert_fail(char
                                                const*, char const*,
                                                int, char const*)+0x199)
                                                [0x55a7aaee96ab]

                                                 5:
                                                (ceph::__ceph_assertf_fail(char
                                                const*, char const*,
                                                int, char const*, char
                                                const*, ...)+0)
                                                [0x55a7aaee982a]

                                                 6:
                                                (interval_set<unsigned
                                                long,
                                                std::map<unsigned
                                                long, unsigned long,
                                                std::less<unsigned
                                                long>,
std::allocator<std::pair<unsigned long const, unsigned long>
                                                > >
                                                >::insert(unsigned
                                                long, unsigned long,
                                                unsigned long*, unsigned
                                                long*)+0x3c6)
                                                [0x55a7ab212a66]

                                                 7:
                                                (BlueStore::allocate_bluefs_freespace(unsigned
                                                long, unsigned long,
                                                std::vector<bluestore_pextent_t,
mempool::pool_allocator<(mempool::pool_index_t)4,
                                                bluestore_pextent_t>
                                                >*)+0x74e)
                                                [0x55a7ab48253e]

                                                 8:
                                                (BlueFS::_expand_slow_device(unsigned
                                                long,
                                                std::vector<bluestore_pextent_t,
mempool::pool_allocator<(mempool::pool_index_t)4,
                                                bluestore_pextent_t>
                                                >&)+0x111)
                                                [0x55a7ab59e921]

                                                 9:
                                                (BlueFS::_allocate(unsigned
                                                char, unsigned long,
                                                bluefs_fnode_t*)+0x68b)
                                                [0x55a7ab59f68b]

                                                 10:
                                                (BlueFS::_flush_range(BlueFS::FileWriter*,
                                                unsigned long, unsigned
                                                long)+0xe5)
                                                [0x55a7ab59fce5]

                                                 11:
                                                (BlueFS::_flush(BlueFS::FileWriter*,
                                                bool)+0x10b)
                                                [0x55a7ab5a1b4b]

                                                 12:
                                                (BlueRocksWritableFile::Flush()+0x3d)
                                                [0x55a7ab5bf84d]

                                                 13:
                                                (rocksdb::WritableFileWriter::Flush()+0x19e)
                                                [0x55a7abbedd0e]

                                                 14:
                                                (rocksdb::WritableFileWriter::Sync(bool)+0x2e)
                                                [0x55a7abbedfee]

                                                 15:
                                                (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status
                                                const&,
                                                rocksdb::CompactionJob::SubcompactionState*,
rocksdb::RangeDelAggregator*, CompactionIterationStats*, rocksdb::Slice
                                                const*)+0xbaa)
                                                [0x55a7abc3b73a]

                                                 16:
(rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d0)
                                                [0x55a7abc3f150]

                                                 17:
                                                (rocksdb::CompactionJob::Run()+0x298)
                                                [0x55a7abc40618]

                                                 18:
                                                (rocksdb::DBImpl::BackgroundCompaction(bool*,
                                                rocksdb::JobContext*,
                                                rocksdb::LogBuffer*,
                                                rocksdb::DBImpl::PrepickedCompaction*)+0xcb7)
                                                [0x55a7aba7fb67]

                                                 19:
(rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*,
rocksdb::Env::Priority)+0xd0) [0x55a7aba813c0]

                                                 20:
                                                (rocksdb::DBImpl::BGWorkCompaction(void*)+0x3a)
                                                [0x55a7aba8190a]

                                                 21:
                                                (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned
                                                long)+0x264)
                                                [0x55a7abc8d9c4]

                                                 22:
                                                (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x4f)
                                                [0x55a7abc8db4f]

                                                 23: (()+0x129dfff)
                                                [0x55a7abd1afff]

                                                 24: (()+0x7dd5)
                                                [0x7f5e50bcfdd5]

                                                 25: (clone()+0x6d)
                                                [0x7f5e4fa95ead]

                                                 NOTE: a copy of the
                                                executable, or `objdump
                                                -rdS <executable>`
                                                is needed to interpret
                                                this.

                                            
                                        _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                                      
                                      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                                    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com