NFS Locking (Very complicated)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I have a cluster with 2 nodes that I want to access using NFS, there is no firewall between the server.
I'm using gluster 3.9.0

The volumes mount correctly on a client machine,  even if I get a strange "portmap query retrying: RPC: Program not registered"

[root@nf1 ~]# mount -vvv -t nfs -o nfsvers=3,rw gluster-nfs-a:/replicated-data /export/replicated-data
mount: fstab path: "/etc/fstab"
.........
mount: external mount: argv[5] = "rw,nfsvers=3"
mount.nfs: timeout set for Thu Jan 19 21:51:05 2017
mount.nfs: trying text-based options 'nfsvers=3,addr=10.1.0.1'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 10.4.0.28 prog 100003 vers 3 prot TCP port 2049
.......
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: portmap query retrying: RPC: Program not registered
.......
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: trying 10.4.0.28 prog 100005 vers 3 prot TCP port 38465
gluster-nfs-a:/replicated-data on /export/replicated-data type nfs (rw,nfsvers=3)




When I try to lock a file on the client with something like:

flock -x -w 10 100 file (...)

the lock doesn't work and I get those line on the server nfs.log:

[2017-01-19 20:03:03.621556] I [rpc-clnt.c:1946:rpc_clnt_reconfig] 0-replicated-data-client-2: changing port to 49159 (from 0)
[2017-01-19 20:03:03.626031] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 0-replicated-data-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-01-19 20:03:03.626661] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-replicated-data-client-0: Connected to replicated-data-client-0, attached to remote volume '/gluster-nfs-zpool/replicated-data'.
[2017-01-19 20:03:03.626693] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-replicated-data-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2017-01-19 20:03:03.626803] I [MSGID: 108005] [afr-common.c:4429:afr_notify] 0-replicated-data-replicate-0: Subvolume 'replicated-data-client-0' came back up; going online.
[2017-01-19 20:03:03.626998] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-replicated-data-client-0: Server lk version = 1
[2017-01-19 20:03:03.630498] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 0-replicated-data-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-01-19 20:03:03.631152] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-replicated-data-client-2: Connected to replicated-data-client-2, attached to remote volume '/gluster-nfs-zpool/replicated-data'.
[2017-01-19 20:03:03.631180] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-replicated-data-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2017-01-19 20:03:03.631651] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-replicated-data-client-2: Server lk version = 1
[2017-01-19 20:03:03.632563] I [MSGID: 108031] [afr-common.c:2178:afr_local_discovery_cbk] 0-replicated-data-replicate-0: selecting local read_child replicated-data-client-0
[2017-01-19 20:03:29.463006] W [MSGID: 112165] [nlm4.c:1484:nlm4svc_lock_common] 0-nfs-NLM: NLM in grace period
[2017-01-19 20:03:32.742975] W [MSGID: 112165] [nlm4.c:1804:nlm4svc_unlock] 0-nfs-NLM: NLM in grace period
The message "W [MSGID: 112165] [nlm4.c:1804:nlm4svc_unlock] 0-nfs-NLM: NLM in grace period" repeated 4 times between [2017-01-19 20:03:32.742975] and [2017-01-19 20:03:52.742771]
[2017-01-19 20:03:57.743581] W [MSGID: 112057] [nlm4.c:1744:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL [No locks available]
[2017-01-19 20:03:57.743652] W [MSGID: 112122] [nlm4.c:1759:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume [Operation not permitted]
[2017-01-19 20:05:03.237220] E [MSGID: 112163] [nlm4.c:570:nsm_monitor] 0-nfs-NLM: clnt_call(): RPC: Success
[2017-01-19 20:06:10.566790] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-replicated-data-client-2: server 10.1.0.33:49159 has not responded in the last 42 seconds, disconnecting.
[2017-01-19 20:06:13.380796] I [socket.c:3396:socket_submit_request] 0-replicated-data-client-2: not connected (priv->connected = -1)
[2017-01-19 20:06:13.380830] W [rpc-clnt.c:1639:rpc_clnt_submit] 0-replicated-data-client-2: failed to submit rpc-request (XID: 0x29 Program: GlusterFS 3.3, ProgVers: 330, Proc: 26) to rpc-transport (replicated-data-client-2)
[2017-01-19 20:06:13.380846] W [MSGID: 114031] [client-rpc-fops.c:2529:client3_3_lk_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected]
[2017-01-19 20:06:45.739806] W [rpc-clnt.c:1639:rpc_clnt_submit] 0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2a Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (replicated-data-client-2)
[2017-01-19 20:07:45.651815] W [MSGID: 114031] [client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected]
[2017-01-19 20:07:47.535963] W [rpc-clnt.c:1639:rpc_clnt_submit] 0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2d Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (replicated-data-client-2)
[2017-01-19 20:08:45.764782] W [rpc-clnt.c:1639:rpc_clnt_submit] 0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2e Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (replicated-data-client-2)
[2017-01-19 20:08:47.553021] W [rpc-clnt.c:1639:rpc_clnt_submit] 0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2f Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (replicated-data-client-2)
The message "W [MSGID: 114031] [client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected]" repeated 3 times between [2017-01-19 20:07:45.651815] and [2017-01-19 20:08:47.553057]
[2017-01-19 20:09:25.952509] E [MSGID: 112167] [nlm4.c:1009:nlm4_establish_callback] 0-nfs-NLM: Unable to get NLM port of the client. Is the firewall running on client? OR Are RPC services running (rpcinfo -p)?
[2017-01-19 20:09:25.953149] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] ))))) 0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3) op(LK(26)) called at 2017-01-19 20:05:13.206864 (xid=0x23)
[2017-01-19 20:09:25.953181] W [MSGID: 114031] [client-rpc-fops.c:2529:client3_3_lk_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected]
[2017-01-19 20:09:25.953845] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] ))))) 0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3) op(LK(26)) called at 2017-01-19 20:05:23.235063 (xid=0x24)
[2017-01-19 20:09:25.954461] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] ))))) 0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3) op(LK(26)) called at 2017-01-19 20:05:43.299167 (xid=0x26)
[2017-01-19 20:09:25.955045] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] ))))) 0-replicated-data-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2017-01-19 20:05:28.562661 (xid=0x25)
[2017-01-19 20:09:25.955077] W [rpc-clnt-ping.c:202:rpc_clnt_ping_cbk] 0-replicated-data-client-2: socket disconnected
[2017-01-19 20:09:25.955567] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] ))))) 0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3) op(STATFS(14)) called at 2017-01-19 20:05:45.774191 (xid=0x27)
[2017-01-19 20:09:25.955612] W [MSGID: 114031] [client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected]
[2017-01-19 20:09:25.956246] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] ))))) 0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3) op(STATFS(14)) called at 2017-01-19 20:05:47.549472 (xid=0x28)
[2017-01-19 20:09:25.956373] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-replicated-data-client-2: disconnected from replicated-data-client-2. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-19 20:09:36.592738] I [rpc-clnt.c:1946:rpc_clnt_reconfig] 0-replicated-data-client-2: changing port to 49159 (from 0)
[2017-01-19 20:09:36.597437] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 0-replicated-data-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-01-19 20:09:36.598036] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-replicated-data-client-2: Connected to replicated-data-client-2, attached to remote volume '/gluster-nfs-zpool/replicated-data'.
[2017-01-19 20:09:36.598059] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-replicated-data-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2017-01-19 20:09:36.598072] I [MSGID: 114042] [client-handshake.c:1053:client_post_handshake] 0-replicated-data-client-2: 1 fds open - Delaying child_up until they are re-opened
[2017-01-19 20:09:36.598343] I [MSGID: 114041] [client-handshake.c:675:client_child_up_reopen_done] 0-replicated-data-client-2: last fd open'd/lock-self-heal'd - notifying CHILD-UP
[2017-01-19 20:09:36.598036] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-replicated-data-client-2: Connected to replicated-data-client-2, attached to remote volume '/gluster-nfs-zpool/replicated-data'.
[2017-01-19 20:09:36.598059] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-replicated-data-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2017-01-19 20:09:36.598072] I [MSGID: 114042] [client-handshake.c:1053:client_post_handshake] 0-replicated-data-client-2: 1 fds open - Delaying child_up until they are re-opened
[2017-01-19 20:09:36.598343] I [MSGID: 114041] [client-handshake.c:675:client_child_up_reopen_done] 0-replicated-data-client-2: last fd open'd/lock-self-heal'd - notifying CHILD-UP
[2017-01-19 20:09:36.598506] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-replicated-data-client-2: Server lk version = 1
[2017-01-19 20:10:47.538183] I [MSGID: 108031] [afr-common.c:2178:afr_local_discovery_cbk] 0-replicated-data-replicate-0: selecting local read_child replicated-data-client-0
The message "W [MSGID: 114031] [client-rpc-fops.c:2529:client3_3_lk_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected]" repeated 2 times between [2017-01-19 20:09:25.953181] and [2017-01-19 20:09:25.954495]
[2017-01-19 20:09:25.956278] W [MSGID: 114031] [client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected]
[2017-01-19 20:14:18.649616] W [socket.c:588:__socket_rwv] 0-NLM-client: readv on 10.4.0.13:32803 failed (No data available)
[2017-01-19 20:26:18.648786] I [MSGID: 101053] [mem-pool.c:638:mem_pool_destroy] 0-nfs-server: size=588 max=1 total=3
[2017-01-19 20:26:18.648831] I [MSGID: 101053] [mem-pool.c:638:mem_pool_destroy] 0-nfs-server: size=124 max=1 total=3


As I said there is no firewall and all the ports listed here are open and reachable from the client on both replica servers:


   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100005    3   tcp  38465  mountd
    100005    1   tcp  38466  mountd
    100003    3   tcp   2049  nfs
    100024    1   udp  53548  status
    100024    1   tcp  35281  status
    100021    4   tcp  38468  nlockmgr
    100021    1   udp    914  nlockmgr
    100227    3   tcp   2049  nfs_acl
    100021    1   tcp    917  nlockmgr


Also I don't understand this:

[2017-01-19 20:03:57.743581] W [MSGID: 112057] [nlm4.c:1744:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL [No locks available]
[2017-01-19 20:03:57.743652] W [MSGID: 112122] [nlm4.c:1759:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume [Operation not permitted]
[2017-01-19 20:05:03.237220] E [MSGID: 112163] [nlm4.c:570:nsm_monitor] 0-nfs-NLM: clnt_call(): RPC: Success
[2017-01-19 20:06:10.566790] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-replicated-data-client-2: server 10.1.0.33:49159 has not responded in the last 42 seconds, disconnecting.


Why the server I'm connected is trying to connected to the replica: 0-replicated-data-client-2: server 10.4.0.33:49159 ?


And this:

[2017-01-19 20:08:47.553021] W [rpc-clnt.c:1639:rpc_clnt_submit] 0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2f Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (replicated-data-client-2)
The message "W [MSGID: 114031] [client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected]" repeated 3 times between [2017-01-19 20:07:45.651815] and [2017-01-19 20:08:47.553057]
[2017-01-19 20:09:25.952509] E [MSGID: 112167] [nlm4.c:1009:nlm4_establish_callback] 0-nfs-NLM: Unable to get NLM port of the client. Is the firewall running on client? OR Are RPC services running (rpcinfo -p)?

Is here talking about the client trying to lock a file in the volume,  or actually the other replica server on 10.1.0.33 ?

Any idea?

Thanks,

C


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux