On Fri, 2020-02-14 at 07:13 -0800, Yiming Zhang wrote: > > On Feb 13, 2020, at 3:52 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > If the OSD daemon dies, then it will have closed all of its fd's and > > there should be no more lock. Therefore you almost certainly have some > > other process running that is holding the lock. > > > > You may have to do a bit of digging in /proc/locks. Determine the > > dev+inode number of the file on which the lock is being set and find it > > in /proc/locks. Then you can track down the PID that's holding that > > lock. > > > I have checked the locks with lslocks, here is the locks when I vstarted ceph (bluestore block = /dev/sdc where sdc is a raw device): > COMMAND PID TYPE SIZE MODE M START END PATH > ceph-mgr 19852 POSIX WRITE 0 0 0 /... > iscsid 1061 POSIX WRITE 0 0 0 /run... > ceph-mgr 14889 POSIX WRITE 0 0 0 /... > rpcbind 990 FLOCK WRITE 0 0 0 /run... > ceph-mon 16430 POSIX WRITE 0 0 0 /... > ceph-mon 16430 POSIX WRITE 0 0 0 /... > ceph-mon 18107 POSIX WRITE 0 0 0 /... > ceph-mon 18107 POSIX WRITE 0 0 0 /... > ceph-mon 19711 POSIX WRITE 0 0 0 /... > ceph-mon 19711 POSIX WRITE 0 0 0 /... > ceph-mon 10495 POSIX WRITE 0 0 0 /... > ceph-mon 10495 POSIX WRITE 0 0 0 /... > ceph-mon 14748 POSIX WRITE 0 0 0 /... > ceph-mon 14748 POSIX WRITE 0 0 0 /... > cron 1085 FLOCK WRITE 0 0 0 /run... > ceph-mgr 18247 POSIX WRITE 0 0 0 /... > atd 1111 POSIX WRITE 0 0 0 /run... > lvmetad 807 POSIX WRITE 0 0 0 /run... > ceph-mgr 10635 POSIX WRITE 0 0 0 /... > ceph-mgr 16571 POSIX WRITE 0 0 0 /… > > Then I kill all related processes and restart cluster, the error “_lock flock failed on /users/xxx/ceph/build/dev/osd0/block” persists. > > After the kill, locks are: > COMMAND PID TYPE SIZE MODE M START END PATH > rpcbind 20267 FLOCK WRITE 0 0 0 /run... > lvmetad 20266 POSIX WRITE 0 0 0 /run… > > The error happens in KernelDevice.cc: > int r = ::flock(fd_directs[WRITE_LIFE_NOT_SET], LOCK_EX | LOCK_NB); > Where r gives -1, and fd_directs[WRITE_LIFE_NOT_SET] will give 11, and WRITE_LIFE_NOT_SET is 0. > > Any suggestions how to proceed with the issue? > Sorry, no. Any lock set on a block device should show up in /proc/locks (as it uses the kernel's generic flock lock mechanism for local filesystems). You may want to play with strace and verify that the error is coming from the kernel and that the program is attempting to set the lock on the file you think it is. What kernel is this running on? -- Jeff Layton <jlayton@xxxxxxxxxx> _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx