Re: 15.2.2 Upgrade - Corruption: error in middle of record

Igor Fedotov <ifedotov@xxxxxxx> · Fri, 22 May 2020 19:12:59 +0300

Status update:

Finally we have the first patch to fix the issue in master: 
https://github.com/ceph/ceph/pull/35201

And ticket has been updated with root cause 
analysis:https://tracker.ceph.com/issues/45613On 5/21/2020 2:07 PM, Igor 
Fedotov wrote:

@Chris - unfortunately it looks like the corruption is permanent since  
valid WAL data are presumably overwritten with another stuff. Hence I 
don't know any way to recover - perhaps you can try cutting

WAL file off which will allow OSD to start. With some latest ops lost. 
Once can use exported BlueFS as a drop in replacement for regular DB 
volume but I'm not aware of details.

And the above are just speculations, can't say for sure if it helps...

I can't explain why WAL doesn't have zero block in your case though. 
Little chances this is a different issue. Just in case - could you 
please search for 32K zero blocks over the whole file? And the same for 
another OSD?

Thanks,

Igor

Short update on the issue:

Finally we're able to reproduce the issue in master (not octopus), 
investigating further..

@Chris - to make sure you're facing the same issue could you please 
check the content of the broken file. To do so:

1) run "ceph-bluestore-tool --path <path-to-osd> --our-dir <target 
dir> --command bluefs-export

This will export bluefs files to <target dir>

2) Check the content for file db.wal/002040.log at offset 0x470000

This will presumably contain 32K of zero bytes. Is this the case?

No hurry as I'm just making sure symptoms in Octopus are the same...

Thanks,

Igor

On 5/20/2020 5:24 PM, Igor Fedotov wrote:
Chris,

got them, thanks!

Investigating....

Thanks,

Igor

On 5/20/2020 5:23 PM, Chris Palmer wrote:
Hi Igor
I've sent you these directly as they're a bit chunky. Let me know if 
you haven't got them.
Thx, Chris

On 20/05/2020 14:43, Igor Fedotov wrote:
Hi Cris,

could you please share the full log prior to the first failure?

Also if possible please set debug-bluestore/debug bluefs to 20 and 
collect another one for failed OSD startup.

Thanks,

Igor

On 5/20/2020 4:39 PM, Chris Palmer wrote:
I'm getting similar errors after rebooting a node. Cluster was 
upgraded 15.2.1 -> 15.2.2 yesterday. No problems after rebooting 
during upgrade.

On the node I just rebooted, 2/4 OSDs won't restart. Similar logs 
from both. Logs from one below.
Neither OSDs have compression enabled, although there is a 
compression-related error in the log.
Both are replicated x3. One has data on HDD & separate WAL/DB on 
NVMe partition, the other is everything on NVMe partition only.

Feeling kinda nervous here - advice welcomed!!

Thx, Chris

2020-05-20T13:14:00.837+0100 7f2e0d273700  3 rocksdb: 
[table/block_based_table_reader.cc:1117] Encountered error while 
reading data from compression dictionary block Corruption: block 
checksum mismatch: expected 0, got 3423870535  in db/000304.sst 
offset 18446744073709551615 size 18446744073709551615
2020-05-20T13:14:00.841+0100 7f2e1957ee00  4 rocksdb: 
[db/version_set.cc:3757] Recovered from manifest 
file:db/MANIFEST-000312 succeeded,manifest_file_number is 312, 
next_file_number is 314, last_sequence is 22320582, log_number is 
309,prev_log_number is 0,max_column_family is 
0,min_log_number_to_keep is 0

2020-05-20T13:14:00.841+0100 7f2e1957ee00  4 rocksdb: 
[db/version_set.cc:3766] Column family [default] (ID 0), log 
number is 309

2020-05-20T13:14:00.841+0100 7f2e1957ee00  4 rocksdb: EVENT_LOG_v1 
{"time_micros": 1589976840843199, "job": 1, "event": 
"recovery_started", "log_files": [313]}
2020-05-20T13:14:00.841+0100 7f2e1957ee00  4 rocksdb: 
[db/db_impl_open.cc:583] Recovering log #313 mode 0
2020-05-20T13:14:00.937+0100 7f2e1957ee00  3 rocksdb: 
[db/db_impl_open.cc:518] db.wal/000313.log: dropping 9044 bytes; 
Corruption: error in middle of record
2020-05-20T13:14:00.937+0100 7f2e1957ee00  3 rocksdb: 
[db/db_impl_open.cc:518] db.wal/000313.log: dropping 86 bytes; 
Corruption: missing start of fragmented record(2)
2020-05-20T13:14:00.937+0100 7f2e1957ee00  4 rocksdb: 
[db/db_impl.cc:390] Shutdown: canceling all background work
2020-05-20T13:14:00.937+0100 7f2e1957ee00  4 rocksdb: 
[db/db_impl.cc:563] Shutdown complete
2020-05-20T13:14:00.937+0100 7f2e1957ee00 -1 rocksdb: Corruption: 
error in middle of record
2020-05-20T13:14:00.937+0100 7f2e1957ee00 -1 
bluestore(/var/lib/ceph/osd/ceph-9) _open_db erroring opening db:
2020-05-20T13:14:00.937+0100 7f2e1957ee00  1 bluefs umount
2020-05-20T13:14:00.937+0100 7f2e1957ee00  1 fbmap_alloc 
0x55daf2b3a900 shutdown
2020-05-20T13:14:00.937+0100 7f2e1957ee00  1 bdev(0x55daf3838700 
/var/lib/ceph/osd/ceph-9/block) close
2020-05-20T13:14:01.093+0100 7f2e1957ee00  1 bdev(0x55daf3838000 
/var/lib/ceph/osd/ceph-9/block) close
2020-05-20T13:14:01.341+0100 7f2e1957ee00 -1 osd.9 0 OSD:init: 
unable to mount object store
2020-05-20T13:14:01.341+0100 7f2e1957ee00 -1 ESC[0;31m ** ERROR: 
osd init failed: (5) Input/output errorESC[0m
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx