On 2/9/19 5:40 PM, Brad Hubbard wrote: > On Sun, Feb 10, 2019 at 1:56 AM Ruben Rodriguez <ruben@xxxxxxx> wrote: >> >> Hi there, >> >> Running 12.2.11-1xenial on a machine with 6 SSD OSD with bluestore. >> >> Today we had two disks fail out of the controller, and after a reboot >> they both seemed to come back fine but ceph-osd was only able to start >> in one of them. The other one gets this: >> >> 2019-02-08 18:53:00.703376 7f64f948ce00 -1 >> bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000 >> checksum at blob offset 0x0, got 0x95104dfc, expected 0xb9e3e26d, device >> location [0x4000~1000], logical extent 0x0~1000, object >> #-1:7b3f43c4:::osd_superblock:0# >> 2019-02-08 18:53:00.703406 7f64f948ce00 -1 osd.3 0 OSD::init() : unable >> to read osd superblock >> >> Note that there are no actual IO errors being shown by the controller in >> dmesg, and that the disk is readable. The metadata FS is mounted and >> looks normal. >> >> I tried running "ceph-bluestore-tool repair --path >> /var/lib/ceph/osd/ceph-3 --deep 1" and that gets many instances of: > > Running this with debug_bluestore=30 might give more information on > the nature of the IO error. I had collected the logs with debug info already, and nothing significant was listed there. I applied this patch https://github.com/ceph/ceph/pull/26247 and it allowed me to move forward. There was a osd map corruption issue that I had to handle by hand, but after that the osd started fine. After it started and backfills finished, the bluestore_ignore_data_csum flag is no longer needed, so I reverted to standard packages. -- Ruben Rodriguez | Chief Technology Officer, Free Software Foundation GPG Key: 05EF 1D2F FE61 747D 1FC8 27C3 7FAC 7D26 472F 4409 https://fsf.org | https://gnu.org
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com