osd crash because rocksdb report  ‘Compaction error: Corruption: block checksum mismatch’

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,all   

   My cluster running  12.2.0  with bluestore, we used fio tool with librbd ioengine make io test  yesterday, and serval osds crash one after another.

   3 * node, 30 OSD, 1TB SATA HDD for OSD data, 1GB SATA SSD  partition for db, 576 MB SATA SSD partition for wal.

   ceph options:

   bluestore_shard_finishers = true

   mon_osd_prime_pg_temp = false

   mon_allow_pool_delete = true

   mgr_op_latency_sample_interval = 300

   

      -9> 2017-09-15 12:20:38.879807 7f079d1a4700  4 rocksdb: [/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/compaction_job.cc:1403] [default] [JOB 3] Compacting 1@1 + 1@2 files to L2, score 1.22

    -8> 2017-09-15 12:20:38.879814 7f079d1a4700  4 rocksdb: [/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/compaction_job.cc:1407] [default] Compaction start summary: Base version 2 Base level 1, inputs: [792(66MB)], [406(65MB)]


    -7> 2017-09-15 12:20:38.879831 7f079d1a4700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1505449238879818, "job": 3, "event": "compaction_started", "files_L1": [792], "files_L2": [406], "score": 1.2195, "input_data_size": 138472863}

    -6> 2017-09-15 12:20:38.946227 7f07b7e07d00  1 freelist init

    -5> 2017-09-15 12:20:40.633404 7f079d1a4700  3 rocksdb: [/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/db_impl_compaction_flush.cc:1591] Compaction error: Corruption: block checksum mismatch

    -4> 2017-09-15 12:20:40.633487 7f079d1a4700  4 rocksdb: (Original Log Time 2017/09/15-12:20:40.633205) [/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/compaction_job.cc:621] [default] compacted to: base level 1 max bytes base 268435456 files[1 5 4 0 0 0 0] max score 0.96, MB/sec: 79.0 rd, 38.3 wr, level 2, files in(1, 1) out(1) MB in(66.5, 65.5) out(64.0), read-write-amplify(2.9) write-amplify(1.0) Corruption: block checksum mismatch, records in: 870254, records dropped: 500216


    -3> 2017-09-15 12:20:40.633502 7f079d1a4700  4 rocksdb: (Original Log Time 2017/09/15-12:20:40.633373) EVENT_LOG_v1 {"time_micros": 1505449240633323, "job": 3, "event": "compaction_finished", "compaction_time_micros": 1753285, "output_level": 2, "num_output_files": 1, "total_output_size": 67111607, "num_input_records": 857815, "num_output_records": 357599, "num_subcompactions": 1, "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 1, "lsm_state": [1, 5, 4, 0, 0, 0, 0]}

    -2> 2017-09-15 12:20:40.633505 7f079d1a4700  2 rocksdb: [/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/db_impl_compaction_flush.cc:1275] Waiting after background compaction error: Corruption: block checksum mismatch, Accumulated background error counts: 1

    -1> 2017-09-15 12:20:40.671905 7f07b7e07d00  1 bluestore(/var/lib/ceph/osd/ceph-11) _open_alloc opening allocation metadata

     0> 2017-09-15 12:20:40.678281 7f07b7e07d00 -1 /clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/os/bluestore/BitAllocator.cc: In function 'virtual void BitAllocator::free_blocks(int64_t, int64_t)' thread 7f07b7e07d00 time 2017-09-15 12:20:40.675594

/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/os/bluestore/BitAllocator.cc: 1270: FAILED assert(start_block + num_blocks <= size())


 ceph version 12.2.0-2 (d177b39d8bf8a81dfacff53487d7d9747e6eadad) luminous (stable)

 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f07b88b9970]

 2: (()+0xa3411a) [0x7f07b887011a]

 3: (BitMapAllocator::insert_free(unsigned long, unsigned long)+0x9d) [0x7f07b886ebcd]

 4: (BitMapAllocator::init_add_free(unsigned long, unsigned long)+0xd3) [0x7f07b886f173]

 5: (BlueStore::_open_alloc()+0x1c0) [0x7f07b8727970]

 6: (BlueStore::_mount(bool)+0x443) [0x7f07b8794fa3]

 7: (OSD::init()+0x3ba) [0x7f07b834e35a]

 8: (main()+0x2def) [0x7f07b825552f]

 9: (__libc_start_main()+0xf5) [0x7f07b4473af5]

 10: (()+0x4b7cc6) [0x7f07b82f3cc6]

 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux