Re: mon service failed to start

Behnam Loghmani <behnam.loghmani@xxxxxxxxx> · Wed, 21 Feb 2018 21:16:32 +0330

Hi there,

I changed SATA port and cable of SSD disk and also update ceph to version 12.2.3 and rebuild OSDs
but when recovery starts OSDs failed with this error:

2018-02-21 21:12:18.037974 7f3479fe2d00 -1 bluestore(/var/lib/ceph/osd/ceph-7)
 _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 
0x84c097b0, expected 0xaf1040a2, device location [0x10000~1000], logical
 extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2018-02-21 21:12:18.038002 7f3479fe2d00 -1 osd.7 0 OSD::init() : unable to read osd superblock
2018-02-21 21:12:18.038009 7f3479fe2d00  1 bluestore(/var/lib/ceph/osd/ceph-7) umount
2018-02-21 21:12:18.038282 7f3479fe2d00  1 stupidalloc 0x0x55e99236c620 shutdown
2018-02-21 21:12:18.038308 7f3479fe2d00  1 freelist shutdown
2018-02-21 21:12:18.038336 7f3479fe2d00  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.3/rpm/el7/BUILD/ceph-12.2.3/src/rocksdb/db/db_impl.cc:217] Shutdown: ca
nceling all background work
2018-02-21 21:12:18.041561 7f3465561700  4 rocksdb: (Original Log Time 2018/02/21-21:12:18.041514) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.3/rpm/el7/BUILD/ceph-12.
2.3/src/rocksdb/db/compaction_job.cc:621]
 [default] compacted to: base level 1 max bytes base 268435456 files[5 0
 0 0 0 0 0] max score 0.00, MB/sec: 2495.2 rd, 10.1 wr, level 1, files 
in(5, 0) out(1) MB in(213.6, 0.0) out(0.9), read-write-amplify(1.0) 
write-amplify(0.0) S
hutdown in progress: Database shutdown or Column 
2018-02-21
 21:12:18.041569 7f3465561700  4 rocksdb: (Original Log Time 
2018/02/21-21:12:18.041545) EVENT_LOG_v1 {"time_micros": 
1519234938041530, "job": 3, "event": "compaction_finished", 
"compaction_time_micros": 89747, "output_level": 1, "num_output_files": 
1, "total_ou
tput_size": 902552, "num_input_records": 4470, 
"num_output_records": 4377, "num_subcompactions": 1, 
"num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 44, "lsm_state": [5, 0, 0, 0, 0, 0, 0]}
2018-02-21
 21:12:18.041663 7f3479fe2d00  4 rocksdb: EVENT_LOG_v1 {"time_micros": 
1519234938041657, "job": 4, "event": "table_file_deletion", 
"file_number": 249}
2018-02-21 21:12:18.042144 7f3479fe2d00  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.3/rpm/el7/BUILD/ceph-12.2.3/src/rocksdb/db/db_impl.cc:343] Shutdown com
plete
2018-02-21 21:12:18.043474 7f3479fe2d00  1 bluefs umount
2018-02-21 21:12:18.043775 7f3479fe2d00  1 stupidalloc 0x0x55e991f05d40 shutdown
2018-02-21 21:12:18.043784 7f3479fe2d00  1 stupidalloc 0x0x55e991f05db0 shutdown
2018-02-21 21:12:18.043786 7f3479fe2d00  1 stupidalloc 0x0x55e991f05e20 shutdown
2018-02-21 21:12:18.043826 7f3479fe2d00  1 bdev(0x55e992254600 /dev/vg0/wal-b) close
2018-02-21 21:12:18.301531 7f3479fe2d00  1 bdev(0x55e992255800 /dev/vg0/db-b) close
2018-02-21 21:12:18.545488 7f3479fe2d00  1 bdev(0x55e992254400 /var/lib/ceph/osd/ceph-7/block) close
2018-02-21 21:12:18.650473 7f3479fe2d00  1 bdev(0x55e992254000 /var/lib/ceph/osd/ceph-7/block) close
2018-02-21 21:12:18.900003 7f3479fe2d00 -1  ** ERROR: osd init failed: (22) Invalid argument

On Wed, Feb 21, 2018 at 5:06 PM, Behnam Loghmani <behnam.loghmani@xxxxxxxxx> wrote:
but disks pass all the tests with smartctl, badblocks and there isn't any error on disks. because the ssd has contain WAL/DB of OSDs it's difficult to test it on other cluster nodes

On Wed, Feb 21, 2018 at 4:58 PM,  <knawnd@xxxxxxxxx> wrote:
Could the problem be related with some faulty hardware (RAID-controller, port, cable) but not disk? Does "faulty" disk works OK on other server?

Behnam Loghmani wrote on 21/02/18 16:09:

Hi there,

I changed the SSD on the problematic node with the new one and reconfigure OSDs and MON service on it.

but the problem occurred again with:

"rocksdb: submit_transaction error: Corruption: block checksum mismatch code = 2"

I get fully confused now.

On Tue, Feb 20, 2018 at 5:16 PM, Behnam Loghmani <behnam.loghmani@xxxxxxxxx <mailto:behnam.loghmani@gmail.com>> wrote:

    Hi Caspar,

    I checked the filesystem and there isn't any error on filesystem.

    The disk is SSD and it doesn't any attribute related to Wear level in smartctl and filesystem is

    mounted with default options and no discard.

    my ceph structure on this node is like this:

    it has osd,mon,rgw services

    1 SSD for OS and WAL/DB

    2 HDD

    OSDs are created by ceph-volume lvm.

    the whole SSD is on 1 vg.

    OS is on root lv

    OSD.1 DB is on db-a

    OSD.1 WAL is on wal-a

    OSD.2 DB is on db-b

    OSD.2 WAL is on wal-b

    output of lvs:

       data-a data-a -wi-a-----

       data-b data-b -wi-a-----

       db-a   vg0    -wi-a-----

       db-b   vg0    -wi-a-----

       root   vg0    -wi-ao----

       wal-a  vg0    -wi-a-----

       wal-b  vg0    -wi-a-----

    after making a heavy write on the radosgw, OSD.1 and OSD.2 has stopped with "block checksum

    mismatch" error.

    Now on this node MON and OSDs services has stopped working with this error

    I think my issue is related to this bug: http://tracker.ceph.com/issues/22102

    <http://tracker.ceph.com/issues/22102>

    I ran

    #ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-1 --deep 1

    but it returns the same error:

    *** Caught signal (Aborted) **

      in thread 7fbf6c923d00 thread_name:ceph-bluestore-

    2018-02-20 16:44:30.128787 7fbf6c923d00 -1 abort: Corruption: block checksum mismatch

      ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)

      1: (()+0x3eb0b1) [0x55f779e6e0b1]

      2: (()+0xf5e0) [0x7fbf61ae15e0]

      3: (gsignal()+0x37) [0x7fbf604d31f7]

      4: (abort()+0x148) [0x7fbf604d48e8]

      5: (RocksDBStore::get(std::string const&, char const*, unsigned long,

    ceph::buffer::list*)+0x1ce) [0x55f779d2b5ce]

      6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x545) [0x55f779cd8f75]

      7: (BlueStore::_fsck(bool, bool)+0x1bb5) [0x55f779cf1a75]

      8: (main()+0xde0) [0x55f779baab90]

      9: (__libc_start_main()+0xf5) [0x7fbf604bfc05]

      10: (()+0x1bc59f) [0x55f779c3f59f]

    2018-02-20 16:44:30.131334 7fbf6c923d00 -1 *** Caught signal (Aborted) **

      in thread 7fbf6c923d00 thread_name:ceph-bluestore-

      ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)

      1: (()+0x3eb0b1) [0x55f779e6e0b1]

      2: (()+0xf5e0) [0x7fbf61ae15e0]

      3: (gsignal()+0x37) [0x7fbf604d31f7]

      4: (abort()+0x148) [0x7fbf604d48e8]

      5: (RocksDBStore::get(std::string const&, char const*, unsigned long,

    ceph::buffer::list*)+0x1ce) [0x55f779d2b5ce]

      6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x545) [0x55f779cd8f75]

      7: (BlueStore::_fsck(bool, bool)+0x1bb5) [0x55f779cf1a75]

      8: (main()+0xde0) [0x55f779baab90]

      9: (__libc_start_main()+0xf5) [0x7fbf604bfc05]

      10: (()+0x1bc59f) [0x55f779c3f59f]

      NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

         -1> 2018-02-20 16:44:30.128787 7fbf6c923d00 -1 abort: Corruption: block checksum mismatch

          0> 2018-02-20 16:44:30.131334 7fbf6c923d00 -1 *** Caught signal (Aborted) **

      in thread 7fbf6c923d00 thread_name:ceph-bluestore-

      ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)

      1: (()+0x3eb0b1) [0x55f779e6e0b1]

      2: (()+0xf5e0) [0x7fbf61ae15e0]

      3: (gsignal()+0x37) [0x7fbf604d31f7]

      4: (abort()+0x148) [0x7fbf604d48e8]

      5: (RocksDBStore::get(std::string const&, char const*, unsigned long,

    ceph::buffer::list*)+0x1ce) [0x55f779d2b5ce]

      6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x545) [0x55f779cd8f75]

      7: (BlueStore::_fsck(bool, bool)+0x1bb5) [0x55f779cf1a75]

      8: (main()+0xde0) [0x55f779baab90]

      9: (__libc_start_main()+0xf5) [0x7fbf604bfc05]

      10: (()+0x1bc59f) [0x55f779c3f59f]

      NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

    Could you please help me to recover this node or find a way to prove SSD disk problem.

    Best regards,

    Behnam Loghmani

    On Mon, Feb 19, 2018 at 1:35 PM, Caspar Smit <casparsmit@xxxxxxxxxxx

    <mailto:casparsmit@xxxxxxxxxxx>> wrote:

        Hi Behnam,

        I would firstly recommend running a filesystem check on the monitor disk first to see if

        there are any inconsistencies.

        Is the disk where the monitor is running on a spinning disk or SSD?

        If SSD you should check the Wear level stats through smartctl.

        Maybe trim (discard) enabled on the filesystem mount? (discard could cause

        problems/corruption in combination with certain SSD firmwares)

        Caspar

        2018-02-16 23:03 GMT+01:00 Behnam Loghmani <behnam.loghmani@xxxxxxxxx

        <mailto:behnam.loghmani@gmail.com>>:

            I checked the disk that monitor is on it with smartctl and it didn't return any error

            and it doesn't have any Current_Pending_Sector.

            Do you recommend any disk checks to make sure that this disk has problem and then I can

            send the report to the provider for replacing the disk

            On Sat, Feb 17, 2018 at 1:09 AM, Gregory Farnum <gfarnum@xxxxxxxxxx

            <mailto:gfarnum@xxxxxxxxxx>> wrote:

                The disk that the monitor is on...there isn't anything for you to configure about a

                monitor WAL though so I'm not sure how that enters into it?

                On Fri, Feb 16, 2018 at 12:46 PM Behnam Loghmani <behnam.loghmani@xxxxxxxxx

                <mailto:behnam.loghmani@gmail.com>> wrote:

                    Thanks for your reply

                    Do you mean, that's the problem with the disk I use for WAL and DB?

                    On Fri, Feb 16, 2018 at 11:33 PM, Gregory Farnum <gfarnum@xxxxxxxxxx

                    <mailto:gfarnum@xxxxxxxxxx>> wrote:

                        On Fri, Feb 16, 2018 at 7:37 AM Behnam Loghmani <behnam.loghmani@xxxxxxxxx

                        <mailto:behnam.loghmani@gmail.com>> wrote:

                            Hi there,

                            I have a Ceph cluster version 12.2.2 on CentOS 7.

                            It is a testing cluster and I have set it up 2 weeks ago.

                            after some days, I see that one of the three mons has stopped(out of

                            quorum) and I can't start it anymore.

                            I checked the mon service log and the output shows this error:

                            """

                            mon.XXXXXX@-1(probing) e4 preinit clean up potentially inconsistent

                            store state

                            rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch 

                        This bit is the important one. Your disk is bad and it’s feeding back

                        corrupted data.

                            code = 2 Rocksdb transaction:

                                  0> 2018-02-16 17:37:07.041812 7f45a1e52e40 -1

                            /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUI

                            LD/ceph-12.2.2/src/mon/MonitorDBStore.h: In function 'void

                            MonitorDBStore::clear(std::set<std::basic_string<char> >&)' thread

                            7f45a1e52e40 time 2018-02-16 17:37:07.040846

                            /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/mon/MonitorDBStore.h:

                            581: FAILE

                            D assert(r >= 0)

                            """

                            the only solution I found is to remove this mon from quorum and remove

                            all mon data and re-add this mon to quorum again.

                            and ceph goes to the healthy status again.

                            but now after some days this mon has stopped and I face the same problem

                            again.

                            My cluster setup is:

                            4 osd hosts

                            total 8 osds

                            3 mons

                            1 rgw

                            this cluster has setup with ceph-volume lvm and wal/db separation on

                            logical volumes.

                            Best regards,

                            Behnam Loghmani

                            _______________________________________________

                            ceph-users mailing list

                            ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>

                            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                            <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

            _______________________________________________

            ceph-users mailing list

            ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>

            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

            <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com