Hello, I am running Ceph cluster on Luminous 12.2.8 with 36 OSD. Today deep-scrub has found error on PG 25.60 and later fail one of OSD. Now PG 25.60 stuck in active+undersized+degraded+inconsistent state. I cant repair it by ceph pg repair 25.60 – the repair process does not start at all. What is the correct recovery process for this situation? === ceph health detail === HEALTH_ERR 1 osds down; 1 scrub errors; Possible data damage: 1 pg inconsistent; Degraded data redundancy: 188063/5900718 objects degraded (3.187%), 117 pgs degraded, 117 pgs undersized OSD_DOWN 1 osds down osd.6 (root=default,host=hv203) is down OSD_SCRUB_ERRORS 1 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent pg 25.60 is active+undersized+degraded+inconsistent, acting [25,4] === ceph.log === 2019-04-26 04:01:35.129464 osd.25 osd.25 10.4.5.207:6800/2469060 167 : cluster [ERR] 25.60 shard 6: soid 25:065e49e9:::rbd_data.3759266b8b4567.0000000000018202:head candidate had a read error 2019-04-26 04:03:31.533671 osd.25 osd.25 10.4.5.207:6800/2469060 168 : cluster [ERR] 25.60 deep-scrub 0 missing, 1 inconsistent objects 2019-04-26 04:03:31.533677 osd.25 osd.25 10.4.5.207:6800/2469060 169 : cluster [ERR] 25.60 deep-scrub 1 errors === ceph-osd.6.log === 2019-04-26 04:53:17.939436 7f6a8ae48700 4 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1403] [default] [JOB 284] Compacting 4@0 + 4@1 files to L1, score 1.00 2019-04-26 04:53:17.939715 7f6a8ae48700 4 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1407] [default] Compaction start summary: Base version 283 Base level 0, inputs: [31929(25MB) 31927(21MB) 31925(22MB) 31923(26MB)], [31912(65MB) 31913(65MB) 31914(65MB) 31915(29MB)] 2019-04-26 04:53:17.939978 7f6a8ae48700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556243597939747, "job": 284, "event": "compaction_started", "files_L0": [31929, 31927, 31925, 31923], "files_L1": [31912, 31913, 31914, 31915], "score": 1, "input_data_size": 339668148} 2019-04-26 04:53:21.500373 7f6a8ae48700 4 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1116] [default] [JOB 284] Generated table #31930: 380678 keys, 69567323 bytes 2019-04-26 04:53:21.500410 7f6a8ae48700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556243601500399, "cf_name": "default", "job": 284, "event": "table_file_creation", "file_number": 31930, "file_size": 69567323, "table_properties": {"data_size": 67110779, "index_size": 1349659, "filter_size": 1105896, "raw_key_size": 22641147, "raw_average_key_size": 59, "raw_value_size": 59452413, "raw_average_value_size": 156, "num_data_blocks": 16601, "num_entries": 380678, "filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "161626", "kMergeOperands": "0"}} 2019-04-26 04:53:24.294928 7f6a8ae48700 4 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1116] [default] [JOB 284] Generated table #31931: 118877 keys, 69059681 bytes 2019-04-26 04:53:24.294964 7f6a8ae48700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556243604294950, "cf_name": "default", "job": 284, "event": "table_file_creation", "file_number": 31931, "file_size": 69059681, "table_properties": {"data_size": 67109949, "index_size": 1495694, "filter_size": 453050, "raw_key_size": 10391245, "raw_average_key_size": 87, "raw_value_size": 63028568, "raw_average_value_size": 530, "num_data_blocks": 16621, "num_entries": 118877, "filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "1266", "kMergeOperands": "0"}} 2019-04-26 04:53:27.979518 7f6a8ae48700 4 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1116] [default] [JOB 284] Generated table #31932: 119238 keys, 69066929 bytes 2019-04-26 04:53:27.979545 7f6a8ae48700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556243607979532, "cf_name": "default", "job": 284, "event": "table_file_creation", "file_number": 31932, "file_size": 69066929, "table_properties": {"data_size": 67112338, "index_size": 1499661, "filter_size": 453942, "raw_key_size": 10424324, "raw_average_key_size": 87, "raw_value_size": 63036698, "raw_average_value_size": 528, "num_data_blocks": 16599, "num_entries": 119238, "filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "3045", "kMergeOperands": "0"}} 2019-04-26 04:53:31.014387 7f6a8ae48700 3 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/db_impl_compaction_flush.cc:1591] Compaction error: Corruption: block checksum mismatch 2019-04-26 04:53:31.014409 7f6a8ae48700 4 rocksdb: (Original Log Time 2019/04/26-04:53:31.012695) [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:621] [default] compacted to: base level 1 max bytes base 268435456 files[4 4 16 0 0 0 0] max score 0.29, MB/sec: 26.0 rd, 21.2 wr, level 1, files in(4, 4) out(4) MB in(97.3, 226.6) out(263.9), read-write-amplify(6.0) write-amplify(2.7) Corruption: block checksum mismatch, records in: 975162, records dropped: 32804 2019-04-26 04:53:31.014413 7f6a8ae48700 4 rocksdb: (Original Log Time 2019/04/26-04:53:31.014231) EVENT_LOG_v1 {"time_micros": 1556243611012706, "job": 284, "event": "compaction_finished", "compaction_time_micros": 13072480, "output_level": 1, "num_output_files": 4, "total_output_size": 276762847, "num_input_records": 788663, "num_output_records": 755859, "num_subcompactions": 1, "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [4, 4, 16, 0, 0, 0, 0]} 2019-04-26 04:53:31.014415 7f6a8ae48700 2 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/db_impl_compaction_flush.cc:1275] Waiting after background compaction error: Corruption: block checksum mismatch, Accumulated background error counts: 1 2019-04-26 04:53:31.143810 7f6a9ae68700 -1 rocksdb: submit_transaction error: Corruption: block checksum mismatch code = 2 Rocksdb transaction: Put( Prefix = M key = 0x0000000000097374'.0000018493.00000000000000927272' Value size = 184) Put( Prefix = M key = 0x0000000000097374'._fastinfo' Value size = 186) Put( Prefix = O key = 0x7f80000000000000190fae4189217262'd_data.25db7f6b8b4567.0000000000001f45!='0xfffffffffffffffeffffffffffffffff6f00000000'x' Value size = 540) Put( Prefix = O key = 0x7f80000000000000190fae4189217262'd_data.25db7f6b8b4567.0000000000001f45!='0xfffffffffffffffeffffffffffffffff'o' Value size = 429) Put( Prefix = L key = 0x00000000003cc72c Value size = 16423) 2019-04-26 04:53:31.152093 7f6a9ae68700 -1 /mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_kv_sync_thread()' thread 7f6a9ae68700 time 2019-04-26 04:53:31.144381 /mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueStore.cc: 8537: FAILED assert(r == 0) ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55acc062cab2] 2: (BlueStore::_kv_sync_thread()+0x24b2) [0x55acc04ba332] 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55acc04fa7ed] 4: (()+0x7494) [0x7f6aab206494] 5: (clone()+0x3f) [0x7f6aaa28dacf] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com