Re: Is it possible to fix corrupted osd superblock?

Yury Z <aboutbus@xxxxxxxxx> · Fri, 1 Feb 2019 18:39:40 +0300

On Thu, 31 Jan 2019 15:51:27 +0000 (UTC)
Sage Weil <sage@xxxxxxxxxxxx> wrote:

> On Thu, 31 Jan 2019, Yury Z wrote:
> > Hi,
> > 
> > We've experimented with runing OSD's in docker containers. And got
> > the situation when two OSD's started with the same block device.
> > File locks inside mounted osd dir didn't catch that issue because
> > mounted osd dirs where inside containers. So, we got corrupted
> > osd_superblock at osd bluestore drive. And now OSD can't be
> > started.  
> 
> AHA!  Someone else ran into this and it was a mystery to me how this 
> happened.  How did you identify locks as the culprit?  And can you 
> describe the situation that led to two competing containers running 
> ceph-osd?

We are running Ceph version 12.2.8 with OS Ubuntu 18.04.

1) Prepare osd lvm block device with ceph-volume tool
# ceph-volume lvm prepare --bluestore --data /dev/sdf
Got block device
/dev/ceph-41fe3fb3-f2ee-47dc-843a-8e55898853bb/osd-block-ea4423a5-850d-49ca-96d7-68c10dd11025

2) Pass /dev filesystem and block device name to docker container

All next steps we run inside container:

3) Create osd dir inside container
# mkdir /var/lib/ceph/osd/ceph-74
This osd dir inside container is unreachable for other containers.

4) Prime (assign) block device to osd dir with ceph-bluestore-tool
# ceph-bluestore-tool prime-osd-dir
--dev /dev/ceph-41fe3fb3-f2ee-47dc-843a-8e55898853bb/osd-block-ea4423a5-850d-49ca-96d7-68c10dd11025
--path /var/lib/ceph/osd/ceph-74/

5) Run ceph-osd inside container
# /usr/bin/ceph-osd -d --cluster ceph --id 74

6) ceph-osd lock lock files inside osd dir and start rw operations on
block device. 
# ls /var/lib/ceph/osd/ceph-74/
block  ceph_fsid  fsid  keyring  ready  type  whoami

7) Each other ceph-osd container can be started the same way with the
same block device name passed.

Since each container with ceph-osd has its own osd dir with its own
lock files, they can't detect each other and prevent simultaneous rw
operations on the same block device.

> > Is it possible to fix corrupted osd superblock?  
> 
> Maybe.. it's hard to tell.  I think teh next step is to add a
> bluestore option to warn on crcc errors but to ignore them.  With
> that option set, we can run a fsck on the OSD to see how much damage
> there really is, and potentially export critical PGs that you need to
> recover.
> 
> What version of Ceph are you running?

We are running Ceph version 12.2.8 with OS Ubuntu 18.04.