Hi,all
My cluster running 12.2.0 with bluestore, we used fio tool with librbd ioengine make io test yesterday, and serval osds crash one after another.
3 * node, 30 OSD, 1TB SATA HDD for OSD data, 1GB SATA SSD partition for db, 576 MB SATA SSD partition for wal.
ceph options:
bluestore_shard_finishers = true
mon_osd_prime_pg_temp = false
mon_allow_pool_delete = true
mgr_op_latency_sample_interval = 300
-9> 2017-09-15 12:20:38.879807 7f079d1a4700 4 rocksdb: [/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/compaction_job.cc:1403] [default] [JOB 3] Compacting 1@1 + 1@2 files to L2, score 1.22
-8> 2017-09-15 12:20:38.879814 7f079d1a4700 4 rocksdb: [/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/compaction_job.cc:1407] [default] Compaction start summary: Base version 2 Base level 1, inputs: [792(66MB)], [406(65MB)]
-7> 2017-09-15 12:20:38.879831 7f079d1a4700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1505449238879818, "job": 3, "event": "compaction_started", "files_L1": [792], "files_L2": [406], "score": 1.2195, "input_data_size": 138472863}
-6> 2017-09-15 12:20:38.946227 7f07b7e07d00 1 freelist init
-5> 2017-09-15 12:20:40.633404 7f079d1a4700 3 rocksdb: [/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/db_impl_compaction_flush.cc:1591] Compaction error: Corruption: block checksum mismatch
-4> 2017-09-15 12:20:40.633487 7f079d1a4700 4 rocksdb: (Original Log Time 2017/09/15-12:20:40.633205) [/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/compaction_job.cc:621] [default] compacted to: base level 1 max bytes base 268435456 files[1 5 4 0 0 0 0] max score 0.96, MB/sec: 79.0 rd, 38.3 wr, level 2, files in(1, 1) out(1) MB in(66.5, 65.5) out(64.0), read-write-amplify(2.9) write-amplify(1.0) Corruption: block checksum mismatch, records in: 870254, records dropped: 500216
-3> 2017-09-15 12:20:40.633502 7f079d1a4700 4 rocksdb: (Original Log Time 2017/09/15-12:20:40.633373) EVENT_LOG_v1 {"time_micros": 1505449240633323, "job": 3, "event": "compaction_finished", "compaction_time_micros": 1753285, "output_level": 2, "num_output_files": 1, "total_output_size": 67111607, "num_input_records": 857815, "num_output_records": 357599, "num_subcompactions": 1, "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 1, "lsm_state": [1, 5, 4, 0, 0, 0, 0]}
-2> 2017-09-15 12:20:40.633505 7f079d1a4700 2 rocksdb: [/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/db_impl_compaction_flush.cc:1275] Waiting after background compaction error: Corruption: block checksum mismatch, Accumulated background error counts: 1
-1> 2017-09-15 12:20:40.671905 7f07b7e07d00 1 bluestore(/var/lib/ceph/osd/ceph-11) _open_alloc opening allocation metadata
0> 2017-09-15 12:20:40.678281 7f07b7e07d00 -1 /clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/os/bluestore/BitAllocator.cc: In function 'virtual void BitAllocator::free_blocks(int64_t, int64_t)' thread 7f07b7e07d00 time 2017-09-15 12:20:40.675594
/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/os/bluestore/BitAllocator.cc: 1270: FAILED assert(start_block + num_blocks <= size())
ceph version 12.2.0-2 (d177b39d8bf8a81dfacff53487d7d9747e6eadad) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f07b88b9970]
2: (()+0xa3411a) [0x7f07b887011a]
3: (BitMapAllocator::insert_free(unsigned long, unsigned long)+0x9d) [0x7f07b886ebcd]
4: (BitMapAllocator::init_add_free(unsigned long, unsigned long)+0xd3) [0x7f07b886f173]
5: (BlueStore::_open_alloc()+0x1c0) [0x7f07b8727970]
6: (BlueStore::_mount(bool)+0x443) [0x7f07b8794fa3]
7: (OSD::init()+0x3ba) [0x7f07b834e35a]
8: (main()+0x2def) [0x7f07b825552f]
9: (__libc_start_main()+0xf5) [0x7f07b4473af5]
10: (()+0x4b7cc6) [0x7f07b82f3cc6]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com