Re: 15.2.2 Upgrade - Corruption: error in middle of record

Igor Fedotov <ifedotov@xxxxxxx> · Wed, 20 May 2020 15:55:51 +0300

Dan, thanks for the info. Good to know.

Failed QA run in the ticket uses snappy though.

And in fact any stuff writing to process memory can  introduce data 
corruption in the similar manner.

So will keep that in mind but IMO relation to compression is still not 
evident...

Kind regards,

Igor

On 5/20/2020 3:32 PM, Dan van der Ster wrote:
lz4 ? It's not obviously related, but I've seen it involved in really
non-obvious ways: https://tracker.ceph.com/issues/39525

-- dan

On Wed, May 20, 2020 at 2:27 PM Ashley Merrick <singapore@xxxxxxxxxxxxxx> wrote:
Thanks, fyi the OSD's that went down back two pools, an Erasure code Meta (RBD) and cephFS Meta. The cephFS Pool does have compresison enabled ( I noticed it mentioned in the ceph tracker)

Thanks

---- On Wed, 20 May 2020 20:17:33 +0800 Igor Fedotov <ifedotov@xxxxxxx> wrote ----

Hi Ashley,

looks like this is a regression. Neha observed similar error(s) during
here QA run, see https://tracker.ceph.com/issues/45613

Please preserve broken OSDs for a while if possible, likely I'll come
back to you for more information to troubleshoot.

Thanks,

Igor

On 5/20/2020 1:26 PM, Ashley Merrick wrote:

So reading online it looked a dead end error, so I recreated the 3 OSD's on that node and now working fine after a reboot.

However I restarted the next server with 3 OSD's and one of them is now facing the same issue.

Let me know if you need any more logs.

Thanks

---- On Wed, 20 May 2020 17:02:31 +0800 Ashley Merrick <mailto:singapore@xxxxxxxxxxxxxx> wrote ----

I just upgraded a cephadm cluster from 15.2.1 to 15.2.2.

Everything went fine on the upgrade, however after restarting one node that has 3 OSD's for ecmeta two of the 3 ODS's now wont boot with the following error:

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000 7fbcc46f7ec0  4 rocksdb: [db/version_set.cc:3757] Recovered from manifest file:db/MANIFEST-002768 succeeded,manifest_file_number is 2768, next_file_number is 2775, last_sequence is 188026749, log_number is 2767,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000 7fbcc46f7ec0  4 rocksdb: [db/version_set.cc:3766] Column family [default] (ID 0), log number is 2767

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000 7fbcc46f7ec0  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1589963382599157, "job": 1, "event": "recovery_started", "log_files": [2769]}

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000 7fbcc46f7ec0  4 rocksdb: [db/db_impl_open.cc:583] Recovering log #2769 mode 0

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000 7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 537526 bytes; Corruption: error in middle of record

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000 7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes; Corruption: missing start of fragmented record(1)

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes; Corruption: missing start of fragmented record(1)

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes; Corruption: missing start of fragmented record(1)

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes; Corruption: missing start of fragmented record(1)

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes; Corruption: missing start of fragmented record(1)

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes; Corruption: missing start of fragmented record(1)

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 23263 bytes; Corruption: missing start of fragmented record(2)

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0  4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0  4 rocksdb: [db/db_impl.cc:563] Shutdown complete

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 -1 rocksdb: Corruption: error in middle of record

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db erroring opening db:

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0  1 bdev(0x558a28dd0700 /var/lib/ceph/osd/ceph-0/block) close

May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.870+0000 7fbcc46f7ec0  1 bdev(0x558a28dd0000 /var/lib/ceph/osd/ceph-0/block) close

May 20 08:29:43 sn-m01 bash[6833]: debug 2020-05-20T08:29:43.118+0000 7fbcc46f7ec0 -1 osd.0 0 OSD:init: unable to mount object store

May 20 08:29:43 sn-m01 bash[6833]: debug 2020-05-20T08:29:43.118+0000 7fbcc46f7ec0 -1  ** ERROR: osd init failed: (5) Input/output error

Have I hit a bug, or is there something I can do to try and fix these OSD's?

Thanks
_______________________________________________
ceph-users mailing list -- mailto:mailto:ceph-users@xxxxxxx
To unsubscribe send an email to mailto:mailto:ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- mailto:ceph-users@xxxxxxx
To unsubscribe send an email to mailto:ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx