client lock error

Vincent Régnard <vregnard@xxxxxxxxxxxxxxxx> · Mon, 05 Nov 2007 15:49:31 +0100

Hi all,

Having strange reports in the glusterfs client log files.

My config is gluster 1.3.7/fuse2.7.0-glfs5, linux 2.6.16.55. We have 3
clients and 3 servers (1 client and 1 server on each host) on a 100Mb
network with 5ms round trip between clients and servers. The 3 clients
replicate with afr on client side over the 3 servers (cf the end of
message for gluster stack details).

I have a cron script that runs every hour and takes the write lock on
file cluster-root/var/lock/fs-stress.lock located on gluster file
system. The script runs fine and locking mechanism seems to works well,
at least as expected. But I have strange error message I dont understand
in glusterfs client logs:

On the client (op1) that effectively takes the lock in that example, I
have only 2 error messages:

# on client (op1)
2007-11-05 13:15:32 E [afr.c:3245:afr_lk_cbk] tbs-clust-data-afr:
(path=/cluster-root/var/lock/fs-stress.lock child=tbs-clust-op1-data)
op_ret=-1 op_errno=107
2007-11-05 13:15:32 E [afr.c:3245:afr_lk_cbk] tbs-clust-data-afr:
(path=/cluster-root/var/lock/fs-stress.lock child=tbs-clust-or3-data)
op_ret=-1 op_errno=107

On the other clients, where there is a locking attempt that normally
fails due the lock already placed by op1 on the file, I have 3 error
messages:

# on client (or2)
2007-11-05 13:15:37 E [afr.c:3245:afr_lk_cbk] tbs-clust-data-afr:
(path=/cluster-root/var/lock/fs-stress.lock child=tbs-clust-or3-data)
op_ret=-1 op_errno=107
2007-11-05 13:15:37 E [afr.c:3245:afr_lk_cbk] tbs-clust-data-afr:
(path=/cluster-root/var/lock/fs-stress.lock child=tbs-clust-op1-data)
op_ret=-1 op_errno=107
2007-11-05 13:15:37 E [afr.c:3245:afr_lk_cbk] tbs-clust-data-afr:
(path=/cluster-root/var/lock/fs-stress.lock child=tbs-clust-or2-data)
op_ret=-1 op_errno=107

# on client (or3)
2007-11-05 13:16:17 E [afr.c:3245:afr_lk_cbk] tbs-clust-data-afr:
(path=/cluster-root/var/lock/fs-stress.lock child=tbs-clust-or2-data)
op_ret=-1 op_errno=107
2007-11-05 13:16:17 E [afr.c:3245:afr_lk_cbk] tbs-clust-data-afr:
(path=/cluster-root/var/lock/fs-stress.lock child=tbs-clust-op1-data)
op_ret=-1 op_errno=107
2007-11-05 13:16:17 E [afr.c:3245:afr_lk_cbk] tbs-clust-data-afr:
(path=/cluster-root/var/lock/fs-stress.lock child=tbs-clust-or3-data)
op_ret=-1 op_errno=107

Note that on the client op1 that takes the lock, the error message
refers to op1 and or3. Seams like the lock is taken on server or2
without problem ? According to the documentation, I have the lock
translator just above posix/storage on server side. Should I have
locking on client side ? Is locking translator appropriate place really
on server side ?

NB: tbs-clust-XXX-data are protocol/client bricks

On servers I have his stack:

storage/posix
features/posix-locks
performance/io-threads
protocol/server

On clients I have this stack:

protocol/client(*3)
cluster/afr
performance/io-threads
performance/io-cache
performance/write-behind

--
Vincent Régnard
vregnard@xxxxxxxxxxxxxxxx
TBS-internet.com
027 630 5902