> On Feb 13, 2020, at 3:52 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > If the OSD daemon dies, then it will have closed all of its fd's and > there should be no more lock. Therefore you almost certainly have some > other process running that is holding the lock. > > You may have to do a bit of digging in /proc/locks. Determine the > dev+inode number of the file on which the lock is being set and find it > in /proc/locks. Then you can track down the PID that's holding that > lock. > I have checked the locks with lslocks, here is the locks when I vstarted ceph (bluestore block = /dev/sdc where sdc is a raw device): COMMAND PID TYPE SIZE MODE M START END PATH ceph-mgr 19852 POSIX WRITE 0 0 0 /... iscsid 1061 POSIX WRITE 0 0 0 /run... ceph-mgr 14889 POSIX WRITE 0 0 0 /... rpcbind 990 FLOCK WRITE 0 0 0 /run... ceph-mon 16430 POSIX WRITE 0 0 0 /... ceph-mon 16430 POSIX WRITE 0 0 0 /... ceph-mon 18107 POSIX WRITE 0 0 0 /... ceph-mon 18107 POSIX WRITE 0 0 0 /... ceph-mon 19711 POSIX WRITE 0 0 0 /... ceph-mon 19711 POSIX WRITE 0 0 0 /... ceph-mon 10495 POSIX WRITE 0 0 0 /... ceph-mon 10495 POSIX WRITE 0 0 0 /... ceph-mon 14748 POSIX WRITE 0 0 0 /... ceph-mon 14748 POSIX WRITE 0 0 0 /... cron 1085 FLOCK WRITE 0 0 0 /run... ceph-mgr 18247 POSIX WRITE 0 0 0 /... atd 1111 POSIX WRITE 0 0 0 /run... lvmetad 807 POSIX WRITE 0 0 0 /run... ceph-mgr 10635 POSIX WRITE 0 0 0 /... ceph-mgr 16571 POSIX WRITE 0 0 0 /… Then I kill all related processes and restart cluster, the error “_lock flock failed on /users/xxx/ceph/build/dev/osd0/block” persists. After the kill, locks are: COMMAND PID TYPE SIZE MODE M START END PATH rpcbind 20267 FLOCK WRITE 0 0 0 /run... lvmetad 20266 POSIX WRITE 0 0 0 /run… The error happens in KernelDevice.cc: int r = ::flock(fd_directs[WRITE_LIFE_NOT_SET], LOCK_EX | LOCK_NB); Where r gives -1, and fd_directs[WRITE_LIFE_NOT_SET] will give 11, and WRITE_LIFE_NOT_SET is 0. Any suggestions how to proceed with the issue? Thanks, -ym > Cheers, > Jeff > > On Wed, 2020-02-12 at 09:03 -0800, Yiming Zhang wrote: >> The weird thing is I don’t have systemd-udev installed on my server. >> Is there any other possible solutions? >> >> The error only happens when I redirect osd data to a raw device. >> >> Thanks, >> Yiming >> >>> On Feb 12, 2020, at 8:36 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >>> >>> Talib was chasing down a similar issue a while back and found that the >>> root cause was systemd-udev, which spawns a process that opens the device >>> after it is closed. You might try removing or disabling that package >>> and see if it goes away? >>> >>> >>> On Wed, 12 Feb 2020, Yiming Zhang wrote: >>> >>>> Hi All, >>>> >>>> I noticed a locking issue in kernel device. >>>> When I stopped the ceph cluster and all daemons, the kernel device _lock somehow is still held and this line below will return r < 0: >>>> >>>> int KernelDevice::_lock() >>>> { >>>> int r = ::flock(fd_directs[WRITE_LIFE_NOT_SET], LOCK_EX | LOCK_NB); >>>> … >>>> } >>>> >>>> The way I stop the cluster and daemons: >>>> >>>> sudo ../src/stop.sh >>>> sudo bin/init-ceph --verbose forcestop >>>> >>>> This error happens even after the reboot when I try to use vstart: >>>> >>>> bdev _lock flock failed on ceph/build/dev/osd0/block >>>> bdev open failed to lock /home/yzhan298/ceph/build/dev/osd0/block: (11) Resource temporarily unavailable >>>> OSD::mkfs: couldn't mount ObjectStore: error (11) Resource temporarily unavailable >>>> ** ERROR: error creating empty object store in ceph/build/dev/osd0: (11) Resource temporarily unavailable >>>> >>>> >>>> Please advice. (On master branch) >>>> >>>> Thanks, >>>> Yiming >>>> _______________________________________________ >>>> Dev mailing list -- dev@xxxxxxx >>>> To unsubscribe send an email to dev-leave@xxxxxxx >> _______________________________________________ >> Dev mailing list -- dev@xxxxxxx >> To unsubscribe send an email to dev-leave@xxxxxxx > > -- > Jeff Layton <jlayton@xxxxxxxxxx> > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx