On Thu, 31 Jan 2019, Yury Z wrote: > Hi, > > We've experimented with runing OSD's in docker containers. And got the > situation when two OSD's started with the same block device. File locks > inside mounted osd dir didn't catch that issue because mounted osd dirs > where inside containers. So, we got corrupted osd_superblock at osd > bluestore drive. And now OSD can't be started. AHA! Someone else ran into this and it was a mystery to me how this happened. How did you identify locks as the culprit? And can you describe the situation that led to two competing containers running ceph-osd? > # /usr/bin/ceph-osd -d --cluster ceph --id 74 > 2019-01-31 15:12:31.889211 7f6ae7fdee40 -1 > bluestore(/var/lib/ceph/osd/ceph-74) _verify_csum bad crc32c/0x1000 > checksum at blob offset 0x0, got 0xd4daeff6, expected 0xda9c1ef0, > device location [0x4000~1000], logical extent 0x0~1000, object > #-1:7b3f43c4:::osd_superblock:0# > 2019-01-31 15:12:31.889227 7f6ae7fdee40 -1 osd.74 0 OSD::init() : > unable to read osd superblock > 2019-01-31 15:12:32.508923 7f6ae7fdee40 -1 ** ERROR: osd init failed: > (22) Invalid argument > > We've tried to fix it with ceph bluestore tool, but it didn't help. > > # ceph-bluestore-tool repair --deep 1 --path /var/lib/ceph/osd/ceph-74 > repair success > > Is it possible to fix corrupted osd superblock? Maybe.. it's hard to tell. I think teh next step is to add a bluestore option to warn on crcc errors but to ignore them. With that option set, we can run a fsck on the OSD to see how much damage there really is, and potentially export critical PGs that you need to recover. What version of Ceph are you running? sage