On Thu, 31 Jan 2019 15:51:27 +0000 (UTC) Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Thu, 31 Jan 2019, Yury Z wrote: > > Hi, > > > > We've experimented with runing OSD's in docker containers. And got > > the situation when two OSD's started with the same block device. > > File locks inside mounted osd dir didn't catch that issue because > > mounted osd dirs where inside containers. So, we got corrupted > > osd_superblock at osd bluestore drive. And now OSD can't be > > started. > > AHA! Someone else ran into this and it was a mystery to me how this > happened. How did you identify locks as the culprit? And can you > describe the situation that led to two competing containers running > ceph-osd? We are running Ceph version 12.2.8 with OS Ubuntu 18.04. 1) Prepare osd lvm block device with ceph-volume tool # ceph-volume lvm prepare --bluestore --data /dev/sdf Got block device /dev/ceph-41fe3fb3-f2ee-47dc-843a-8e55898853bb/osd-block-ea4423a5-850d-49ca-96d7-68c10dd11025 2) Pass /dev filesystem and block device name to docker container All next steps we run inside container: 3) Create osd dir inside container # mkdir /var/lib/ceph/osd/ceph-74 This osd dir inside container is unreachable for other containers. 4) Prime (assign) block device to osd dir with ceph-bluestore-tool # ceph-bluestore-tool prime-osd-dir --dev /dev/ceph-41fe3fb3-f2ee-47dc-843a-8e55898853bb/osd-block-ea4423a5-850d-49ca-96d7-68c10dd11025 --path /var/lib/ceph/osd/ceph-74/ 5) Run ceph-osd inside container # /usr/bin/ceph-osd -d --cluster ceph --id 74 6) ceph-osd lock lock files inside osd dir and start rw operations on block device. # ls /var/lib/ceph/osd/ceph-74/ block ceph_fsid fsid keyring ready type whoami 7) Each other ceph-osd container can be started the same way with the same block device name passed. Since each container with ceph-osd has its own osd dir with its own lock files, they can't detect each other and prevent simultaneous rw operations on the same block device. > > Is it possible to fix corrupted osd superblock? > > Maybe.. it's hard to tell. I think teh next step is to add a > bluestore option to warn on crcc errors but to ignore them. With > that option set, we can run a fsck on the OSD to see how much damage > there really is, and potentially export critical PGs that you need to > recover. > > What version of Ceph are you running? We are running Ceph version 12.2.8 with OS Ubuntu 18.04.