Thanks Igor, Do you have any idea on a e.t.a or plan for people that are running 15.2.2 to be able to patch / fix the issue. I had a read of the ticket and seems the corruption is happening but the WAL is not read till OSD restart, so I imagine we will need some form of fix / patch we can apply to a running OSD before we then restart the OSD, as a normal OSD upgrade will require the OSD to restart to apply the code resulting in a corrupt OSD. Thanks ---- On Sat, 23 May 2020 00:12:59 +0800 Igor Fedotov <ifedotov@xxxxxxx> wrote ---- Status update: Finally we have the first patch to fix the issue in master: https://github.com/ceph/ceph/pull/35201 And ticket has been updated with root cause analysis:https://tracker.ceph.com/issues/45613On 5/21/2020 2:07 PM, Igor Fedotov wrote: @Chris - unfortunately it looks like the corruption is permanent since valid WAL data are presumably overwritten with another stuff. Hence I don't know any way to recover - perhaps you can try cutting WAL file off which will allow OSD to start. With some latest ops lost. Once can use exported BlueFS as a drop in replacement for regular DB volume but I'm not aware of details. And the above are just speculations, can't say for sure if it helps... I can't explain why WAL doesn't have zero block in your case though. Little chances this is a different issue. Just in case - could you please search for 32K zero blocks over the whole file? And the same for another OSD? Thanks, Igor > Short update on the issue: > > Finally we're able to reproduce the issue in master (not octopus), > investigating further.. > > @Chris - to make sure you're facing the same issue could you please > check the content of the broken file. To do so: > > 1) run "ceph-bluestore-tool --path <path-to-osd> --our-dir <target > dir> --command bluefs-export > > This will export bluefs files to <target dir> > > 2) Check the content for file db.wal/002040.log at offset 0x470000 > > This will presumably contain 32K of zero bytes. Is this the case? > > > No hurry as I'm just making sure symptoms in Octopus are the same... > > > Thanks, > > Igor > > On 5/20/2020 5:24 PM, Igor Fedotov wrote: >> Chris, >> >> got them, thanks! >> >> Investigating.... >> >> >> Thanks, >> >> Igor >> >> On 5/20/2020 5:23 PM, Chris Palmer wrote: >>> Hi Igor >>> I've sent you these directly as they're a bit chunky. Let me know if >>> you haven't got them. >>> Thx, Chris >>> >>> On 20/05/2020 14:43, Igor Fedotov wrote: >>>> Hi Cris, >>>> >>>> could you please share the full log prior to the first failure? >>>> >>>> Also if possible please set debug-bluestore/debug bluefs to 20 and >>>> collect another one for failed OSD startup. >>>> >>>> >>>> Thanks, >>>> >>>> Igor >>>> >>>> >>>> On 5/20/2020 4:39 PM, Chris Palmer wrote: >>>>> I'm getting similar errors after rebooting a node. Cluster was >>>>> upgraded 15.2.1 -> 15.2.2 yesterday. No problems after rebooting >>>>> during upgrade. >>>>> >>>>> On the node I just rebooted, 2/4 OSDs won't restart. Similar logs >>>>> from both. Logs from one below. >>>>> Neither OSDs have compression enabled, although there is a >>>>> compression-related error in the log. >>>>> Both are replicated x3. One has data on HDD & separate WAL/DB on >>>>> NVMe partition, the other is everything on NVMe partition only. >>>>> >>>>> Feeling kinda nervous here - advice welcomed!! >>>>> >>>>> Thx, Chris >>>>> >>>>> >>>>> >>>>> 2020-05-20T13:14:00.837+0100 7f2e0d273700 3 rocksdb: >>>>> [table/block_based_table_reader.cc:1117] Encountered error while >>>>> reading data from compression dictionary block Corruption: block >>>>> checksum mismatch: expected 0, got 3423870535 in db/000304.sst >>>>> offset 18446744073709551615 size 18446744073709551615 >>>>> 2020-05-20T13:14:00.841+0100 7f2e1957ee00 4 rocksdb: >>>>> [db/version_set.cc:3757] Recovered from manifest >>>>> file:db/MANIFEST-000312 succeeded,manifest_file_number is 312, >>>>> next_file_number is 314, last_sequence is 22320582, log_number is >>>>> 309,prev_log_number is 0,max_column_family is >>>>> 0,min_log_number_to_keep is 0 >>>>> >>>>> 2020-05-20T13:14:00.841+0100 7f2e1957ee00 4 rocksdb: >>>>> [db/version_set.cc:3766] Column family [default] (ID 0), log >>>>> number is 309 >>>>> >>>>> 2020-05-20T13:14:00.841+0100 7f2e1957ee00 4 rocksdb: EVENT_LOG_v1 >>>>> {"time_micros": 1589976840843199, "job": 1, "event": >>>>> "recovery_started", "log_files": [313]} >>>>> 2020-05-20T13:14:00.841+0100 7f2e1957ee00 4 rocksdb: >>>>> [db/db_impl_open.cc:583] Recovering log #313 mode 0 >>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 3 rocksdb: >>>>> [db/db_impl_open.cc:518] db.wal/000313.log: dropping 9044 bytes; >>>>> Corruption: error in middle of record >>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 3 rocksdb: >>>>> [db/db_impl_open.cc:518] db.wal/000313.log: dropping 86 bytes; >>>>> Corruption: missing start of fragmented record(2) >>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 4 rocksdb: >>>>> [db/db_impl.cc:390] Shutdown: canceling all background work >>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 4 rocksdb: >>>>> [db/db_impl.cc:563] Shutdown complete >>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 -1 rocksdb: Corruption: >>>>> error in middle of record >>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 -1 >>>>> bluestore(/var/lib/ceph/osd/ceph-9) _open_db erroring opening db: >>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 1 bluefs umount >>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 1 fbmap_alloc >>>>> 0x55daf2b3a900 shutdown >>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 1 bdev(0x55daf3838700 >>>>> /var/lib/ceph/osd/ceph-9/block) close >>>>> 2020-05-20T13:14:01.093+0100 7f2e1957ee00 1 bdev(0x55daf3838000 >>>>> /var/lib/ceph/osd/ceph-9/block) close >>>>> 2020-05-20T13:14:01.341+0100 7f2e1957ee00 -1 osd.9 0 OSD:init: >>>>> unable to mount object store >>>>> 2020-05-20T13:14:01.341+0100 7f2e1957ee00 -1 ESC[0;31m ** ERROR: >>>>> osd init failed: (5) Input/output errorESC[0m >>>>> _______________________________________________ >>>>> ceph-users mailing list -- mailto:ceph-users@xxxxxxx >>>>> To unsubscribe send an email to mailto:ceph-users-leave@xxxxxxx >>> >> _______________________________________________ >> ceph-users mailing list -- mailto:ceph-users@xxxxxxx >> To unsubscribe send an email to mailto:ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- mailto:ceph-users@xxxxxxx > To unsubscribe send an email to mailto:ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- mailto:ceph-users@xxxxxxx To unsubscribe send an email to mailto:ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx