Re: Is it possible to fix corrupted osd superblock?

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 1 Feb 2019 17:11:29 +0000 (UTC)

On Fri, 1 Feb 2019, Yury Z wrote:
> On Thu, 31 Jan 2019 23:27:21 +0000 (UTC)
> Sage Weil <sage@xxxxxxxxxxxx> wrote:
> 
> > On Thu, 31 Jan 2019, Sage Weil wrote:
> > > On Thu, 31 Jan 2019, Yury Z wrote:  
> > > > Hi,
> > > > 
> > > > We've experimented with runing OSD's in docker containers. And
> > > > got the situation when two OSD's started with the same block
> > > > device. File locks inside mounted osd dir didn't catch that issue
> > > > because mounted osd dirs where inside containers. So, we got
> > > > corrupted osd_superblock at osd bluestore drive. And now OSD
> > > > can't be started.  
> > > 
> > > AHA!  Someone else ran into this and it was a mystery to me how
> > > this happened.  How did you identify locks as the culprit?  And can
> > > you describe the situation that led to two competing containers
> > > running ceph-osd?  
> > 
> > I looked into this a bit and I'm not sure competing docker containers 
> > explains the issue.  The bluestore code takes a fcntl lock on the
> > block device when it opens it before doing anything at all, and I
> > *think* those should work just fine across the container boundaries.
> 
> As far as i can see, bluestore code takes a fcntl lock on the "fsid"
> file inside osd dir, not block device. BlueStore::_lock_fsid method.
> In our case, we have the same block device, but different osd dirs for
> each ceph-osd docker container. So, they can't detect each other and
> prevent simultaneous rw operations on the same block device.

The KernelDevice.cc *also* takes a lock on the block device itself, 
which should be the same inode across any containers.  I'm trying to 
figure out why that lock isn't working, though... :/

sage