This is *probably* unrelated to the upgrade as it's complaining at a very early stage about data corruption. (Earlier than the bug that would trigger related to the 12.2.9 issues) So this might just be a coincidence with a bad disk. That being said: you are running a 12.2.9 OSD and you probably should not upgrade to 12.2.10 especially while a backfill is running. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 Am Di., 27. Nov. 2018 um 23:04 Uhr schrieb Cassiano Pilipavicius <cassiano@xxxxxxxxxxx>: > > Hi, I am facing a problem where a OSD wont start after moving to a new > node with 12.2.10 (the old one has 12.2.8) > > I have one node of my cluster failed and trued to move 3 osds to a new > node. 2 of the 3 osds has started and is running fine at the moment > (backfiling is still in place.) but one of the osds just dont start with > the following error on the logs (writing mostly to try to find if this > is a bug or if have I done something wrong): > > 2018-11-27 19:44:38.013454 7fba0d35fd80 -1 > bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 > checksum at blob offset 0x0, got 0xb1a184d1, expected 0xb682fc52, device > location [0x10000~1000], logical extent 0x0~1000, object > #-1:7b3f43c4:::osd_superblock:0# > 2018-11-27 19:44:38.013501 7fba0d35fd80 -1 osd.1 0 OSD::init() : unable > to read osd superblock > 2018-11-27 19:44:38.013511 7fba0d35fd80 1 > bluestore(/var/lib/ceph/osd/ceph-1) umount > 2018-11-27 19:44:38.065478 7fba0d35fd80 1 stupidalloc 0x0x55ebb04c3f80 > shutdown > 2018-11-27 19:44:38.077261 7fba0d35fd80 1 freelist shutdown > 2018-11-27 19:44:38.077316 7fba0d35fd80 4 rocksdb: > [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.10/rpm/el7/BUILD/ceph-12.2.10/src/rocksdb/db/db_impl.cc:217] > Shutdown: canceling all background work > 2018-11-27 19:44:38.077982 7fba0d35fd80 4 rocksdb: > [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.10/rpm/el7/BUILD/ceph-12.2.10/src/rocksdb/db/db_impl.cc:343] > Shutdown complete > 2018-11-27 19:44:38.107923 7fba0d35fd80 1 bluefs umount > 2018-11-27 19:44:38.108248 7fba0d35fd80 1 stupidalloc 0x0x55ebb01cddc0 > shutdown > 2018-11-27 19:44:38.108302 7fba0d35fd80 1 bdev(0x55ebb01cf800 > /var/lib/ceph/osd/ceph-1/block) close > 2018-11-27 19:44:38.362984 7fba0d35fd80 1 bdev(0x55ebb01cf600 > /var/lib/ceph/osd/ceph-1/block) close > 2018-11-27 19:44:38.470791 7fba0d35fd80 -1 ** ERROR: osd init failed: > (22) Invalid argument > > My cluster has too many mixed versions, I havent realized that the > versions is changed when running a yum update and righ now I have the > following situation:ceph versions > { > "mon": { > "ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) > luminous (stable)": 1, > "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) > luminous (stable)": 2 > }, > "mgr": { > "ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) > luminous (stable)": 1 > }, > "osd": { > "ceph version 12.2.10 > (177915764b752804194937482a39e95e0ca3de94) luminous (stable)": 2, > "ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) > luminous (stable)": 18, > "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) > luminous (stable)": 27, > "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) > luminous (stable)": 1 > }, > "mds": {}, > "overall": { > "ceph version 12.2.10 > (177915764b752804194937482a39e95e0ca3de94) luminous (stable)": 2, > "ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) > luminous (stable)": 20, > "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) > luminous (stable)": 29, > "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) > luminous (stable)": 1 > } > } > > Is there an easy way to get the OSD working again? I am thinking about > waiting the backfill/recovery to finish and them upgrade all nodes to > 12.2.10 and if the OSD dont come up, recreating the OSD. > > Regards, > Cassiano Pilipavicius. > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com