Re: NFSv4 memory allocation bug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 El 13/01/11 19:05, J. Bruce Fields escribiÃ:
On Thu, Jan 13, 2011 at 06:25:23PM +0100, Txema Heredia Genestar wrote:
  El 13/01/11 17:19, J. Bruce Fields escribiÃ:
On Thu, Jan 13, 2011 at 04:48:26PM +0100, Txema Heredia Genestar wrote:
  Hi Bruce, thanks for your answer


El 12/01/11 19:35, J. Bruce Fields escribiÃ:
On Wed, Jan 12, 2011 at 06:14:53PM +0100, Txema Heredia Genestar wrote:
Additionally, I have checked tcpdump and found, when mounting an
NFS4 drive from a working storage-system:
...
12:38:06.372303 IP client.907>    storage.nfs: . ack 29 win 46
<nop,nop,timestamp 4063464822 174132214>
12:38:06.372429 IP client.2364980656>    storage.nfs: 148 getattr [|nfs]
12:38:06.372792 IP storage.nfs>    client.2364980656: reply ok 248
getattr [|nfs]
12:38:06.372958 IP client.2381757872>    storage.nfs: 172 getattr [|nfs]
12:38:06.373132 IP storage.nfs>    client.2381757872: reply ok 88
getattr [|nfs]
12:38:06.373157 IP client.2398535088>    storage.nfs: 176 getattr [|nfs]
12:38:06.373316 IP storage.nfs>    client.2398535088: reply ok 100
getattr [|nfs]
12:38:06.373339 IP client.2415312304>    storage.nfs: 172 getattr [|nfs]


But when I mount from the same client, the NFS4 share from my server
gets stuck on the "getattr" call
...
12:36:37.051840 IP client.926>    server.nfs: . ack 29 win 140
<nop,nop,timestamp 4063375488 434039929>
12:36:37.051903 IP client.1734362088>    server.nfs: 148 getattr [|nfs]
12:36:37.090274 IP server.nfs>    client.926: . ack 192 win 4742
<nop,nop,timestamp 434039939 4063375488>
---silence---
Something like wireshark would give a few more details.
I have wiresharked it and I don't see any differences between the
"getattr" packages in both cases. Do you want me to paste them in a
specific format?
I'm curious which attributes were requested.  In particular, is the
unreplied-to getattr the *first* time that the client requests the owner
or owner_group attributes?

Yes, the "unreplied-to" getattr call is the very first (and only)
time it those are requested:
Yeah, so almost certainly some idmapping problem.
Mi current idmap library is 0.12-16.20, built on 16/1/2009

Might also be worth looking at the nfs4.idtoname cache contents after
the hang:

	rpcdebug -m rpc -s cache
	cat /proc/net/rpc/nfs4.idtoname/content

I seem to recall c9b6cbe56d3ac471e6cd72a59ec9e324b3417016 or
0a725fc4d3bfc4734164863d6c50208b109ca5c7 being possible causes of hangs.

--b.
Unfortunately, rpcdebug is not present in this server. So my
/proc/net/rpc/nfs4.idtoname/content file is empty.

May this command be of any use?
"echo "65535">  /proc/sys/sunrpc/rpc_debug"
Yeah doing that may get you a little more information.

Note that kernel's pretty out of date with respect to mainstream so your
best bet is your distro support or seeing if you can reproduce the
problem on a more recent kernel.

--b.

This is the /var/log/messages log after enabling debugging for nfs, nfsd and rpc:

Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 TCP (listen) state change 10 Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 served by daemon ffff8102a5ede400 Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket ffff8101fd09c080, inuse=1 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff8101fd09c080 data 0 conn 1 close 0 Jan 14 12:01:49 server kernel: svc: tcp_accept ffff8101fd09c080 sock ffff81028b747380
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff810282a67780 allocated
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 busy, not enqueued Jan 14 12:01:49 server kernel: nfsd: connect from unprivileged port: 172.22.201.203:40892
Jan 14 12:01:49 server kernel: nfsd: connect from 172.22.201.203:9fbc
Jan 14 12:01:49 server kernel: svc: svc_setup_socket ffff810282a67780
Jan 14 12:01:49 server kernel: setting up TCP socket for reading
Jan 14 12:01:49 server kernel: svc: svc_setup_socket created ffff810167ed3d80 (inet ffff810125e547c0) Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by daemon ffff810236f4f000 Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 served by daemon ffff81029edad000 Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP data ready (svsk ffff810167ed3d80) Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not enqueued
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 busy, not enqueued Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for data (to = 900000) Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000, socket ffff810167ed3d80, inuse=1 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 conn 0 close 0 Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 recvfrom(ffff810167ed3e2c, 0) = 4
Jan 14 12:01:49 server kernel: svc: TCP record, 40 bytes
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 recvfrom(ffff810149105028, 4056) = 40
Jan 14 12:01:49 server kernel: svc: TCP complete record (40 bytes)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: got len=40
Jan 14 12:01:49 server kernel: svc: svc_authenticate (0)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not enqueued
Jan 14 12:01:49 server kernel: svc: calling dispatcher
Jan 14 12:01:49 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 sendto([ffff8101dfe10000 28... ], 28) = 28 (addr cbc916ac) Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not enqueued Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000 waiting for data (to = 900000) Jan 14 12:01:49 server kernel: svc: server ffff81029edad000, socket ffff8101fd09c080, inuse=1 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff8101fd09c080 data 0 conn 1 close 0 Jan 14 12:01:49 server kernel: svc: tcp_accept ffff8101fd09c080 sock ffff81028b747380
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff810282a67d00 allocated
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: server ffff81029edad000 waiting for data (to = 900000) Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket ffff810167ed3d80, inuse=1 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 conn 0 close 0 Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 recvfrom(ffff810167ed3e28, 4) = -11
Jan 14 12:01:49 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 14 12:01:49 server kernel: svc: got len=-11
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for data (to = 900000) Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP (connected) state change 8 (svsk ffff810167ed3d80) Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by daemon ffff8102a5ede400 Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP data ready (svsk ffff810167ed3d80) Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not enqueued Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket ffff810167ed3d80, inuse=1 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 conn 0 close 1
Jan 14 12:01:49 server kernel: svc: svc_delete_socket(ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: server socket destroy delayed
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: releasing dead socket
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for data (to = 900000) Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 TCP (listen) state change 10 Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 served by daemon ffff8102a5ede400 Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket ffff8101fd09c080, inuse=1 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff8101fd09c080 data 0 conn 1 close 0 Jan 14 12:01:49 server kernel: svc: tcp_accept ffff8101fd09c080 sock ffff81028b747380
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff810282a67780 allocated
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 busy, not enqueued
Jan 14 12:01:49 server kernel: nfsd: connect from 172.22.201.203:03d0
Jan 14 12:01:49 server kernel: svc: svc_setup_socket ffff810282a67780
Jan 14 12:01:49 server kernel: setting up TCP socket for reading
Jan 14 12:01:49 server kernel: svc: svc_setup_socket created ffff810167ed3d80 (inet ffff810125e547c0) Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP data ready (svsk ffff810167ed3d80) Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not enqueued Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by daemon ffff81029edad000 Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 served by daemon ffff810236f4f000
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 busy, not enqueued Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for data (to = 900000) Jan 14 12:01:49 server kernel: svc: server ffff81029edad000, socket ffff810167ed3d80, inuse=1 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 conn 0 close 0 Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 recvfrom(ffff810167ed3e2c, 0) = 4
Jan 14 12:01:49 server kernel: svc: TCP record, 40 bytes
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 recvfrom(ffff8102451ec028, 4056) = 40
Jan 14 12:01:49 server kernel: svc: TCP complete record (40 bytes)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: got len=40
Jan 14 12:01:49 server kernel: svc: svc_authenticate (0)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not enqueued
Jan 14 12:01:49 server kernel: svc: calling dispatcher
Jan 14 12:01:49 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 sendto([ffff81017bafe000 28... ], 28) = 28 (addr cbc916ac) Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not enqueued Jan 14 12:01:49 server kernel: svc: server ffff81029edad000 waiting for data (to = 900000) Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000, socket ffff8101fd09c080, inuse=1 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff8101fd09c080 data 0 conn 1 close 0 Jan 14 12:01:49 server kernel: svc: tcp_accept ffff8101fd09c080 sock ffff81028b747380
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff810282a67d00 allocated
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000 waiting for data (to = 900000) Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket ffff810167ed3d80, inuse=1 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 conn 0 close 0 Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 recvfrom(ffff810167ed3e28, 4) = -11
Jan 14 12:01:49 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 14 12:01:49 server kernel: svc: got len=-11
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for data (to = 900000) Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP data ready (svsk ffff810167ed3d80) Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by daemon ffff8102a5ede400 Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket ffff810167ed3d80, inuse=1 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 conn 0 close 0 Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 recvfrom(ffff810167ed3e2c, 0) = 4
Jan 14 12:01:49 server kernel: svc: TCP record, 144 bytes
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 recvfrom(ffff810273143090, 3952) = 144
Jan 14 12:01:49 server kernel: svc: TCP complete record (144 bytes)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by daemon ffff810236f4f000
Jan 14 12:01:49 server kernel: svc: got len=144
Jan 14 12:01:49 server kernel: svc: svc_authenticate (1)
Jan 14 12:01:49 server kernel: Want update, refage=120, age=0
Jan 14 12:01:49 server kernel: svc: svc_process dropit
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 dropped request
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not enqueued Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for data (to = 900000) Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000, socket ffff810167ed3d80, inuse=2 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 conn 0 close 0 Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 recvfrom(ffff810167ed3e28, 4) = -11
Jan 14 12:01:49 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 14 12:01:49 server kernel: svc: got len=-11
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000 waiting for data (to = 900000)
Jan 14 12:01:49 server kernel: revisit queued
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by daemon ffff810236f4f000 Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000, socket ffff810167ed3d80, inuse=1 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 0 conn 0 close 0 Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: got len=144
Jan 14 12:01:49 server kernel: svc: svc_authenticate (1)
Jan 14 12:01:49 server kernel: svc: calling dispatcher
Jan 14 12:01:49 server kernel: nfsd_dispatch: vers 4 proc 1
Jan 14 12:01:49 server kernel: nfsv4 compound op #1: 24
Jan 14 12:01:49 server kernel: Want update, refage=120, age=0
Jan 14 12:01:49 server kernel: nfsd: Dropping request due to malloc failure!
**Jan 14 12:01:49 server kernel: svc: svc_process dropit**
**Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 dropped request** Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not enqueued Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000 waiting for data (to = 900000) Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket ffff810167ed3d80, inuse=2 Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 0 conn 0 close 0 Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 recvfrom(ffff810167ed3e28, 4) = -11
Jan 14 12:01:49 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 14 12:01:49 server kernel: svc: got len=-11
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for data (to = 900000)


---After I Ctrl+C 'd the mount command in the client:---


Jan 14 12:01:57 server kernel: svc: socket ffff810125e547c0 TCP (connected) state change 8 (svsk ffff810167ed3d80) Jan 14 12:01:57 server kernel: svc: socket ffff810125e547c0 served by daemon ffff8102a5ede400 Jan 14 12:01:57 server kernel: svc: socket ffff810125e547c0 TCP data ready (svsk ffff810167ed3d80) Jan 14 12:01:57 server kernel: svc: socket ffff810125e547c0 busy, not enqueued Jan 14 12:01:57 server kernel: svc: server ffff8102a5ede400, socket ffff810167ed3d80, inuse=2 Jan 14 12:01:57 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 conn 0 close 1
Jan 14 12:01:57 server kernel: svc: svc_delete_socket(ffff810167ed3d80)
Jan 14 12:01:57 server kernel: svc: server socket destroy delayed
Jan 14 12:01:57 server kernel: svc: got len=0
Jan 14 12:01:57 server kernel: svc: server ffff8102a5ede400 waiting for data (to = 900000)


I also tried to run idmap in verbose mode to see what it says
 # rpc.idmapd -vvv -f
rpc.idmapd: Opened /proc/net/rpc/nfs4.nametoid/channel
rpc.idmapd: Opened /proc/net/rpc/nfs4.idtoname/channel

but it doesn't send anything to messages and, when checking /proc/net/rpc/nfs4.idtoname/content, it's still empty
# cat /proc/net/rpc/nfs4.idtoname/content
#domain type id [name]

May this be a hint that my nfsd and idmapd are not communicating? (and so the getattr is never answered)


As for the kernel version, unfortunately I am not allowed to upgrade it right now, as this is a production host (it's in a High Availability cluster but I can do the nfs tests because the other one is serving the "main" nfs right now), but I expect there will be an updating stop during this year.

Txema

PS: Damn! I have just realized you are the author of idmap, thanks man!
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux