El 13/01/11 19:05, J. Bruce Fields escribiÃ:
On Thu, Jan 13, 2011 at 06:25:23PM +0100, Txema Heredia Genestar wrote:
El 13/01/11 17:19, J. Bruce Fields escribiÃ:
On Thu, Jan 13, 2011 at 04:48:26PM +0100, Txema Heredia Genestar wrote:
Hi Bruce, thanks for your answer
El 12/01/11 19:35, J. Bruce Fields escribiÃ:
On Wed, Jan 12, 2011 at 06:14:53PM +0100, Txema Heredia Genestar wrote:
Additionally, I have checked tcpdump and found, when mounting an
NFS4 drive from a working storage-system:
...
12:38:06.372303 IP client.907> storage.nfs: . ack 29 win 46
<nop,nop,timestamp 4063464822 174132214>
12:38:06.372429 IP client.2364980656> storage.nfs: 148 getattr [|nfs]
12:38:06.372792 IP storage.nfs> client.2364980656: reply ok 248
getattr [|nfs]
12:38:06.372958 IP client.2381757872> storage.nfs: 172 getattr [|nfs]
12:38:06.373132 IP storage.nfs> client.2381757872: reply ok 88
getattr [|nfs]
12:38:06.373157 IP client.2398535088> storage.nfs: 176 getattr [|nfs]
12:38:06.373316 IP storage.nfs> client.2398535088: reply ok 100
getattr [|nfs]
12:38:06.373339 IP client.2415312304> storage.nfs: 172 getattr [|nfs]
But when I mount from the same client, the NFS4 share from my server
gets stuck on the "getattr" call
...
12:36:37.051840 IP client.926> server.nfs: . ack 29 win 140
<nop,nop,timestamp 4063375488 434039929>
12:36:37.051903 IP client.1734362088> server.nfs: 148 getattr [|nfs]
12:36:37.090274 IP server.nfs> client.926: . ack 192 win 4742
<nop,nop,timestamp 434039939 4063375488>
---silence---
Something like wireshark would give a few more details.
I have wiresharked it and I don't see any differences between the
"getattr" packages in both cases. Do you want me to paste them in a
specific format?
I'm curious which attributes were requested. In particular, is the
unreplied-to getattr the *first* time that the client requests the owner
or owner_group attributes?
Yes, the "unreplied-to" getattr call is the very first (and only)
time it those are requested:
Yeah, so almost certainly some idmapping problem.
Mi current idmap library is 0.12-16.20, built on 16/1/2009
Might also be worth looking at the nfs4.idtoname cache contents after
the hang:
rpcdebug -m rpc -s cache
cat /proc/net/rpc/nfs4.idtoname/content
I seem to recall c9b6cbe56d3ac471e6cd72a59ec9e324b3417016 or
0a725fc4d3bfc4734164863d6c50208b109ca5c7 being possible causes of hangs.
--b.
Unfortunately, rpcdebug is not present in this server. So my
/proc/net/rpc/nfs4.idtoname/content file is empty.
May this command be of any use?
"echo "65535"> /proc/sys/sunrpc/rpc_debug"
Yeah doing that may get you a little more information.
Note that kernel's pretty out of date with respect to mainstream so your
best bet is your distro support or seeing if you can reproduce the
problem on a more recent kernel.
--b.
This is the /var/log/messages log after enabling debugging for nfs, nfsd
and rpc:
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 TCP (listen)
state change 10
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 served by
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket
ffff8101fd09c080, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff8101fd09c080 data 0
conn 1 close 0
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff8101fd09c080 sock
ffff81028b747380
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff810282a67780 allocated
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: nfsd: connect from unprivileged port:
172.22.201.203:40892
Jan 14 12:01:49 server kernel: nfsd: connect from 172.22.201.203:9fbc
Jan 14 12:01:49 server kernel: svc: svc_setup_socket ffff810282a67780
Jan 14 12:01:49 server kernel: setting up TCP socket for reading
Jan 14 12:01:49 server kernel: svc: svc_setup_socket created
ffff810167ed3d80 (inet ffff810125e547c0)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by
daemon ffff810236f4f000
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 served by
daemon ffff81029edad000
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP data
ready (svsk ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000, socket
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80
recvfrom(ffff810167ed3e2c, 0) = 4
Jan 14 12:01:49 server kernel: svc: TCP record, 40 bytes
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80
recvfrom(ffff810149105028, 4056) = 40
Jan 14 12:01:49 server kernel: svc: TCP complete record (40 bytes)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: got len=40
Jan 14 12:01:49 server kernel: svc: svc_authenticate (0)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: svc: calling dispatcher
Jan 14 12:01:49 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80
sendto([ffff8101dfe10000 28... ], 28) = 28 (addr cbc916ac)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000 waiting for
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff81029edad000, socket
ffff8101fd09c080, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff8101fd09c080 data 0
conn 1 close 0
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff8101fd09c080 sock
ffff81028b747380
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff810282a67d00 allocated
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: server ffff81029edad000 waiting for
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80
recvfrom(ffff810167ed3e28, 4) = -11
Jan 14 12:01:49 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 14 12:01:49 server kernel: svc: got len=-11
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP
(connected) state change 8 (svsk ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP data
ready (svsk ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1
conn 0 close 1
Jan 14 12:01:49 server kernel: svc: svc_delete_socket(ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: server socket destroy delayed
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: releasing dead socket
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 TCP (listen)
state change 10
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 served by
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket
ffff8101fd09c080, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff8101fd09c080 data 0
conn 1 close 0
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff8101fd09c080 sock
ffff81028b747380
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff810282a67780 allocated
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: nfsd: connect from 172.22.201.203:03d0
Jan 14 12:01:49 server kernel: svc: svc_setup_socket ffff810282a67780
Jan 14 12:01:49 server kernel: setting up TCP socket for reading
Jan 14 12:01:49 server kernel: svc: svc_setup_socket created
ffff810167ed3d80 (inet ffff810125e547c0)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP data
ready (svsk ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by
daemon ffff81029edad000
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 served by
daemon ffff810236f4f000
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff81029edad000, socket
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80
recvfrom(ffff810167ed3e2c, 0) = 4
Jan 14 12:01:49 server kernel: svc: TCP record, 40 bytes
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80
recvfrom(ffff8102451ec028, 4056) = 40
Jan 14 12:01:49 server kernel: svc: TCP complete record (40 bytes)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: got len=40
Jan 14 12:01:49 server kernel: svc: svc_authenticate (0)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: svc: calling dispatcher
Jan 14 12:01:49 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80
sendto([ffff81017bafe000 28... ], 28) = 28 (addr cbc916ac)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff81029edad000 waiting for
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000, socket
ffff8101fd09c080, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff8101fd09c080 data 0
conn 1 close 0
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff8101fd09c080 sock
ffff81028b747380
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff810282a67d00 allocated
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000 waiting for
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80
recvfrom(ffff810167ed3e28, 4) = -11
Jan 14 12:01:49 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 14 12:01:49 server kernel: svc: got len=-11
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP data
ready (svsk ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80
recvfrom(ffff810167ed3e2c, 0) = 4
Jan 14 12:01:49 server kernel: svc: TCP record, 144 bytes
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80
recvfrom(ffff810273143090, 3952) = 144
Jan 14 12:01:49 server kernel: svc: TCP complete record (144 bytes)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by
daemon ffff810236f4f000
Jan 14 12:01:49 server kernel: svc: got len=144
Jan 14 12:01:49 server kernel: svc: svc_authenticate (1)
Jan 14 12:01:49 server kernel: Want update, refage=120, age=0
Jan 14 12:01:49 server kernel: svc: svc_process dropit
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 dropped request
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000, socket
ffff810167ed3d80, inuse=2
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80
recvfrom(ffff810167ed3e28, 4) = -11
Jan 14 12:01:49 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 14 12:01:49 server kernel: svc: got len=-11
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000 waiting for
data (to = 900000)
Jan 14 12:01:49 server kernel: revisit queued
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by
daemon ffff810236f4f000
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000, socket
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 0
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: got len=144
Jan 14 12:01:49 server kernel: svc: svc_authenticate (1)
Jan 14 12:01:49 server kernel: svc: calling dispatcher
Jan 14 12:01:49 server kernel: nfsd_dispatch: vers 4 proc 1
Jan 14 12:01:49 server kernel: nfsv4 compound op #1: 24
Jan 14 12:01:49 server kernel: Want update, refage=120, age=0
Jan 14 12:01:49 server kernel: nfsd: Dropping request due to malloc failure!
**Jan 14 12:01:49 server kernel: svc: svc_process dropit**
**Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 dropped
request**
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000 waiting for
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket
ffff810167ed3d80, inuse=2
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 0
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80
recvfrom(ffff810167ed3e28, 4) = -11
Jan 14 12:01:49 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 14 12:01:49 server kernel: svc: got len=-11
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for
data (to = 900000)
---After I Ctrl+C 'd the mount command in the client:---
Jan 14 12:01:57 server kernel: svc: socket ffff810125e547c0 TCP
(connected) state change 8 (svsk ffff810167ed3d80)
Jan 14 12:01:57 server kernel: svc: socket ffff810125e547c0 served by
daemon ffff8102a5ede400
Jan 14 12:01:57 server kernel: svc: socket ffff810125e547c0 TCP data
ready (svsk ffff810167ed3d80)
Jan 14 12:01:57 server kernel: svc: socket ffff810125e547c0 busy, not
enqueued
Jan 14 12:01:57 server kernel: svc: server ffff8102a5ede400, socket
ffff810167ed3d80, inuse=2
Jan 14 12:01:57 server kernel: svc: tcp_recv ffff810167ed3d80 data 1
conn 0 close 1
Jan 14 12:01:57 server kernel: svc: svc_delete_socket(ffff810167ed3d80)
Jan 14 12:01:57 server kernel: svc: server socket destroy delayed
Jan 14 12:01:57 server kernel: svc: got len=0
Jan 14 12:01:57 server kernel: svc: server ffff8102a5ede400 waiting for
data (to = 900000)
I also tried to run idmap in verbose mode to see what it says
# rpc.idmapd -vvv -f
rpc.idmapd: Opened /proc/net/rpc/nfs4.nametoid/channel
rpc.idmapd: Opened /proc/net/rpc/nfs4.idtoname/channel
but it doesn't send anything to messages and, when checking
/proc/net/rpc/nfs4.idtoname/content, it's still empty
# cat /proc/net/rpc/nfs4.idtoname/content
#domain type id [name]
May this be a hint that my nfsd and idmapd are not communicating? (and
so the getattr is never answered)
As for the kernel version, unfortunately I am not allowed to upgrade it
right now, as this is a production host (it's in a High Availability
cluster but I can do the nfs tests because the other one is serving the
"main" nfs right now), but I expect there will be an updating stop
during this year.
Txema
PS: Damn! I have just realized you are the author of idmap, thanks man!
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html