Hi Dan,
one of our customers reported practically the same issue with fnctl
locks but no negative PIDs:
0807178332093/mailboxes/Spam/rbox-Mails/dovecot.index.log (WRITE
lock held by pid 25164)
0807178336211/mailboxes/INBOX/rbox-Mails/dovecot.index.log (WRITE
lock held by pid 8143)
These errors occured during failure tests where the underlying MDS
servers were shutoff. Restarting dovecot was enough to get rid of the
erros. The mounted dovecot directories are pinned to specific MDS
daemons, the environment is not in production though.
Since we saw these for the first time and the root cause was a
disaster scenario we didn't really take the time to investigate, so I
can't really share anything, just confirm it (for now), maybe this
topic comes up again.
Regards,
Eugen
Zitat von Dan van der Ster <dan@xxxxxxxxxxxxxx>:
Hi,
Yeah the negative pid is interesting. AFAICT we use a negative pid to
indicate that the lock was taken on another host:
https://github.com/torvalds/linux/blob/master/fs/ceph/locks.c#L119
https://github.com/torvalds/linux/commit/9d5b86ac13c573795525ecac6ed2db39ab23e2a8
"Finally, we convert remote filesystems to present remote pids using
negative numbers. Have lustre, 9p, ceph, cifs, and dlm negate the remote
pid returned for F_GETLK lock requests."
The good news is that my colleagues managed to clear this filelock by
restarting dovecot on a couple nodes.
But I'm still curious if others have a nice way to debug such things.
Cheers, Dan
On Mon, Nov 9, 2020 at 8:11 PM Anthony D'Atri
<anthony.datri@xxxxxxxxx> wrote:
Looks like a - in front of the 9605 — signed/unsigned int flern?
> On Nov 9, 2020, at 4:59 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> Hi all,
>
> MDS version v14.2.11
> Client kernel 3.10.0-1127.19.1.el7.x86_64
>
> We are seeing a strange issue with a dovecot use-case on cephfs.
> Occasionally we have dovecot reporting a file locked, such as:
>
> Nov 09 13:55:00 dovecot-backend-00.cern.ch dovecot[27710]:
> imap(reguero)<23945><fRA6B6yznq68uE28>: Error: Mailbox Deleted Items:
> Timeout (180s) while waiting for lock for transaction log file
> /mail/users/r/reguero//mdbox/mailboxes/Deleted
> Items/dbox-Mails/dovecot.index.log (WRITE lock held by pid -9605)
>
> We checked all hosts that have mounted the cephfs -- there is no pid 9605.
>
> Is there any way to see who exactly created the lock? ceph_filelock
> has a client id, but I didn't find a way to inspect the
> cephfs_metadata to see the ceph_filelock directly.
>
> Otherwise, are other Dovecot/CephFS users seeing this? Did you switch
> to flock or lockfile instead of fnctlk locks?
>
> Thanks!
>
> Dan
>
> P.S. here is the output from print locks tool from the kernel client:
>
> Read lock:
> Type: 1 (0: Read, 1: Write, 2: Unlocked)
> Whence: 0 (0: start, 1: current, 2: end)
> Offset: 0
> Len: 1
> Pid: -9605
> Write lock:
> Type: 1 (0: Read, 1: Write, 2: Unlocked)
> Whence: 0 (0: start, 1: current, 2: end)
> Offset: 0
> Len: 1
> Pid: -9605
>
> and same file from a 15.2.5 fuse client :
>
> Read lock:
> Type: 1 (0: Read, 1: Write, 2: Unlocked)
> Whence: 0 (0: start, 1: current, 2: end)
> Offset: 0
> Len: 0
> Pid: 0
> Write lock:
> Type: 1 (0: Read, 1: Write, 2: Unlocked)
> Whence: 0 (0: start, 1: current, 2: end)
> Offset: 0
> Len: 0
> Pid: 0
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx