Re: OSDs failing to start due to crc32 and osdmap error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

looks like line I got :

# ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-888 --deep 1

_verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x73a13c49, expected 0x21d59f5e, device location [0x268e810000~1000], logical extent 0x30000~1000, object #-1:2b46bd33:::osdmap.927580:0#

was most interesting and was giving number of corrupted epoch

following brings OSDs back to life:
# ceph osd getmap -o osdmap.927580 927580
# CEPH_ARGS="--bluestore-ignore-data-csum" ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-888/ --op set-osdmap --file osdmap.927580

thx!

On 11/27/23 20:59, Wesley Dillingham wrote:
So those options are not consistent with the error in the video I linked.

I am not entirely sure how to proceed with your OSDs (how many are impacted?)

but you may want to try injecting an older osdmap epoch fetched from the mon in your osdmap injection:

try rewinding 1 epoch at a time from the current and see if that gets them to start.

Proceed with caution, I would test this as well.

Respectfully,

*Wes Dillingham*
wes@xxxxxxxxxxxxxxxxx
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Mon, Nov 27, 2023 at 2:36 PM Denis Polom <denispolom@xxxxxxxxx> wrote:

    it's:

    "bluestore_compression_algorithm": "snappy"

    "bluestore_compression_mode": "none"


    On 11/27/23 20:13, Wesley Dillingham wrote:
    How about these two options:

    bluestore_compression_algorithm
    bluestore_compression_mode

    Thanks.

    Respectfully,

    *Wes Dillingham*
    wes@xxxxxxxxxxxxxxxxx
    LinkedIn <http://www.linkedin.com/in/wesleydillingham>


    On Mon, Nov 27, 2023 at 2:01 PM Denis Polom
    <denispolom@xxxxxxxxx> wrote:

        Hi,

        no we don't:

        "bluestore_rocksdb_options":
        "compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2,max_total_wal_size=1073741824",

        thx

        On 11/27/23 19:17, Wesley Dillingham wrote:
        Curious if you are using bluestore compression?

        Respectfully,

        *Wes Dillingham*
        wes@xxxxxxxxxxxxxxxxx
        LinkedIn <http://www.linkedin.com/in/wesleydillingham>


        On Mon, Nov 27, 2023 at 10:09 AM Denis Polom
        <denispolom@xxxxxxxxx> wrote:

            Hi

            we have issue to start some OSDs on one node on our Ceph
            Quincy 17.2.7
            cluster. Some OSDs on that node are running fine, but
            some failing to start.

            Looks like crc32 checksum error, and failing to get OSD
            map. I found a
            some discussions on that but nothing helped.

            I've also tried to insert current OSD map but that ends
            with error:

            # CEPH_ARGS="--bluestore-ignore-data-csum"
            ceph-objectstore-tool
            --data-path /var/lib/ceph/osd/ceph-888/ --op set-osdmap
            --file osdmap
            osdmap (#-1:20684533:::osdmap.931991:0#) does not exist.

            Log is bellow

            Any ideas please?

            Thank you


             From log file:

            2023-11-27T16:01:47.691+0100 7f3f17aa13c0 -1 Falling
            back to public
            interface

            2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1
            bluestore(/var/lib/ceph/osd/ceph-888) _verify_csum bad
            crc32c/0x1000
            checksum at blob offset 0x0, got 0xb1701b42, expected
            0x9ee5ece2, device
            location [0x10000~1000], logical extent 0x0~1000, object
            #-1:7b3f43c4:::osd_superblock:0#

            2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1 osd.888 0
            failed to load
            OSD map for epoch 927580, got 0 bytes

            /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
            OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
            2023-11-27T16:01:51.443522+0100
            /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED
            ceph_assert(ret)
              ceph version 17.2.7
            (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
            (stable)
              1: (ceph::__ceph_assert_fail(char const*, char const*,
            int, char
            const*)+0x14f) [0x561ad07d2624]
              2: ceph-osd(+0xc2e836) [0x561ad07d2836]
              3: (OSD::init()+0x4026) [0x561ad08e5a86]
              4: main()
              5: __libc_start_main()
              6: _start()
            *** Caught signal (Aborted) **
              in thread 7f3f17aa13c0 thread_name:ceph-osd
            2023-11-27T16:01:51.443+0100 7f3f17aa13c0 -1
            /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
            OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
            2023-11-27T16:01:51.443522+0100
            /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED
            ceph_assert(ret)

              ceph version 17.2.7
            (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
            (stable)
              1: (ceph::__ceph_assert_fail(char const*, char const*,
            int, char
            const*)+0x14f) [0x561ad07d2624]
              2: ceph-osd(+0xc2e836) [0x561ad07d2836]
              3: (OSD::init()+0x4026) [0x561ad08e5a86]
              4: main()
              5: __libc_start_main()
              6: _start()


              ceph version 17.2.7
            (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
            (stable)
              1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)
            [0x7f3f1814b420]
              2: gsignal()
              3: abort()
              4: (ceph::__ceph_assert_fail(char const*, char const*,
            int, char
            const*)+0x1b7) [0x561ad07d268c]
              5: ceph-osd(+0xc2e836) [0x561ad07d2836]
              6: (OSD::init()+0x4026) [0x561ad08e5a86]
              7: main()
              8: __libc_start_main()
              9: _start()
            2023-11-27T16:01:51.447+0100 7f3f17aa13c0 -1 *** Caught
            signal (Aborted) **
              in thread 7f3f17aa13c0 thread_name:ceph-osd

              ceph version 17.2.7
            (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
            (stable)
              1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)
            [0x7f3f1814b420]
              2: gsignal()
              3: abort()
              4: (ceph::__ceph_assert_fail(char const*, char const*,
            int, char
            const*)+0x1b7) [0x561ad07d268c]
              5: ceph-osd(+0xc2e836) [0x561ad07d2836]
              6: (OSD::init()+0x4026) [0x561ad08e5a86]
              7: main()
              8: __libc_start_main()
              9: _start()
              NOTE: a copy of the executable, or `objdump -rdS
            <executable>` is
            needed to interpret this.


               -558> 2023-11-27T16:01:47.691+0100 7f3f17aa13c0 -1
            Falling back to
            public interface

                 -5> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1
            bluestore(/var/lib/ceph/osd/ceph-888) _verify_csum bad
            crc32c/0x1000
            checksum at blob offset 0x0, got 0xb1701b42, expected
            0x9ee5ece2, device
            location [0x10000~1000], logical extent 0x0~1000, object
            #-1:7b3f43c4:::osd_superblock:0#

                 -2> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1
            osd.888 0 failed
            to load OSD map for epoch 927580, got 0 bytes

                 -1> 2023-11-27T16:01:51.443+0100 7f3f17aa13c0 -1
            /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
            OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
            2023-11-27T16:01:51.443522+0100
            /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED
            ceph_assert(ret)

              ceph version 17.2.7
            (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
            (stable)
              1: (ceph::__ceph_assert_fail(char const*, char const*,
            int, char
            const*)+0x14f) [0x561ad07d2624]
              2: ceph-osd(+0xc2e836) [0x561ad07d2836]
              3: (OSD::init()+0x4026) [0x561ad08e5a86]
              4: main()
              5: __libc_start_main()
              6: _start()


                  0> 2023-11-27T16:01:51.447+0100 7f3f17aa13c0 -1
            *** Caught signal
            (Aborted) **
              in thread 7f3f17aa13c0 thread_name:ceph-osd

              ceph version 17.2.7
            (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
            (stable)
              1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)
            [0x7f3f1814b420]
              2: gsignal()
              3: abort()
              4: (ceph::__ceph_assert_fail(char const*, char const*,
            int, char
            const*)+0x1b7) [0x561ad07d268c]
              5: ceph-osd(+0xc2e836) [0x561ad07d2836]
              6: (OSD::init()+0x4026) [0x561ad08e5a86]
              7: main()
              8: __libc_start_main()
              9: _start()
              NOTE: a copy of the executable, or `objdump -rdS
            <executable>` is
            needed to interpret this.


               -562> 2023-11-27T16:01:47.691+0100 7f3f17aa13c0 -1
            Falling back to
            public interface

                 -9> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1
            bluestore(/var/lib/ceph/osd/ceph-888) _verify_csum bad
            crc32c/0x1000
            checksum at blob offset 0x0, got 0xb1701b42, expected
            0x9ee5ece2, device
            location [0x10000~1000], logical extent 0x0~1000, object
            #-1:7b3f43c4:::osd_superblock:0#

                 -6> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1
            osd.888 0 failed
            to load OSD map for epoch 927580, got 0 bytes

                 -5> 2023-11-27T16:01:51.443+0100 7f3f17aa13c0 -1
            /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
            OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
            2023-11-27T16:01:51.443522+0100
            /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED
            ceph_assert(ret)

              ceph version 17.2.7
            (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
            (stable)
              1: (ceph::__ceph_assert_fail(char const*, char const*,
            int, char
            const*)+0x14f) [0x561ad07d2624]
              2: ceph-osd(+0xc2e836) [0x561ad07d2836]
              3: (OSD::init()+0x4026) [0x561ad08e5a86]
              4: main()
              5: __libc_start_main()
              6: _start()


                 -4> 2023-11-27T16:01:51.447+0100 7f3f17aa13c0 -1
            *** Caught signal
            (Aborted) **
              in thread 7f3f17aa13c0 thread_name:ceph-osd

              ceph version 17.2.7
            (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
            (stable)
              1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)
            [0x7f3f1814b420]
              2: gsignal()
              3: abort()
              4: (ceph::__ceph_assert_fail(char const*, char const*,
            int, char
            const*)+0x1b7) [0x561ad07d268c]
              5: ceph-osd(+0xc2e836) [0x561ad07d2836]
              6: (OSD::init()+0x4026) [0x561ad08e5a86]
              7: main()
              8: __libc_start_main()
              9: _start()
              NOTE: a copy of the executable, or `objdump -rdS
            <executable>` is
            needed to interpret this.


            Aborted
            _______________________________________________
            ceph-users mailing list -- ceph-users@xxxxxxx
            To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux