Dan, thanks for the info. Good to know.
Failed QA run in the ticket uses snappy though.
And in fact any stuff writing to process memory can introduce data
corruption in the similar manner.
So will keep that in mind but IMO relation to compression is still not
evident...
Kind regards,
Igor
On 5/20/2020 3:32 PM, Dan van der Ster wrote:
lz4 ? It's not obviously related, but I've seen it involved in really
non-obvious ways: https://tracker.ceph.com/issues/39525
-- dan
On Wed, May 20, 2020 at 2:27 PM Ashley Merrick <singapore@xxxxxxxxxxxxxx> wrote:
Thanks, fyi the OSD's that went down back two pools, an Erasure code Meta (RBD) and cephFS Meta. The cephFS Pool does have compresison enabled ( I noticed it mentioned in the ceph tracker)
Thanks
---- On Wed, 20 May 2020 20:17:33 +0800 Igor Fedotov <ifedotov@xxxxxxx> wrote ----
Hi Ashley,
looks like this is a regression. Neha observed similar error(s) during
here QA run, see https://tracker.ceph.com/issues/45613
Please preserve broken OSDs for a while if possible, likely I'll come
back to you for more information to troubleshoot.
Thanks,
Igor
On 5/20/2020 1:26 PM, Ashley Merrick wrote:
So reading online it looked a dead end error, so I recreated the 3 OSD's on that node and now working fine after a reboot.
However I restarted the next server with 3 OSD's and one of them is now facing the same issue.
Let me know if you need any more logs.
Thanks
---- On Wed, 20 May 2020 17:02:31 +0800 Ashley Merrick <mailto:singapore@xxxxxxxxxxxxxx> wrote ----
I just upgraded a cephadm cluster from 15.2.1 to 15.2.2.
Everything went fine on the upgrade, however after restarting one node that has 3 OSD's for ecmeta two of the 3 ODS's now wont boot with the following error:
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000 7fbcc46f7ec0 4 rocksdb: [db/version_set.cc:3757] Recovered from manifest file:db/MANIFEST-002768 succeeded,manifest_file_number is 2768, next_file_number is 2775, last_sequence is 188026749, log_number is 2767,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000 7fbcc46f7ec0 4 rocksdb: [db/version_set.cc:3766] Column family [default] (ID 0), log number is 2767
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000 7fbcc46f7ec0 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1589963382599157, "job": 1, "event": "recovery_started", "log_files": [2769]}
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000 7fbcc46f7ec0 4 rocksdb: [db/db_impl_open.cc:583] Recovering log #2769 mode 0
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000 7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 537526 bytes; Corruption: error in middle of record
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000 7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes; Corruption: missing start of fragmented record(1)
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes; Corruption: missing start of fragmented record(1)
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes; Corruption: missing start of fragmented record(1)
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes; Corruption: missing start of fragmented record(1)
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes; Corruption: missing start of fragmented record(1)
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes; Corruption: missing start of fragmented record(1)
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 23263 bytes; Corruption: missing start of fragmented record(2)
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 4 rocksdb: [db/db_impl.cc:563] Shutdown complete
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 -1 rocksdb: Corruption: error in middle of record
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db erroring opening db:
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000 7fbcc46f7ec0 1 bdev(0x558a28dd0700 /var/lib/ceph/osd/ceph-0/block) close
May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.870+0000 7fbcc46f7ec0 1 bdev(0x558a28dd0000 /var/lib/ceph/osd/ceph-0/block) close
May 20 08:29:43 sn-m01 bash[6833]: debug 2020-05-20T08:29:43.118+0000 7fbcc46f7ec0 -1 osd.0 0 OSD:init: unable to mount object store
May 20 08:29:43 sn-m01 bash[6833]: debug 2020-05-20T08:29:43.118+0000 7fbcc46f7ec0 -1 ** ERROR: osd init failed: (5) Input/output error
Have I hit a bug, or is there something I can do to try and fix these OSD's?
Thanks
_______________________________________________
ceph-users mailing list -- mailto:mailto:ceph-users@xxxxxxx
To unsubscribe send an email to mailto:mailto:ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- mailto:ceph-users@xxxxxxx
To unsubscribe send an email to mailto:ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx