NFSv4 memory allocation bug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 Hello everybody,

I have a host serving disk through the network using NFS v3, and it has been working ok for 3 years. Recently I have been benchmarking its throughput and I wanted to test how it would work if I used NFS v4 instead, but I couldn't mount a single drive.

This is what I see:

Client:
client# mount -v -t nfs4 serverurl:/ /mnt/NFS_test/
mount: pinging: prog 100003 vers 4 prot tcp port 2049
---no response---

Server:
No error messages. Once I enabled debugging with echo "65535" > /proc/sys/sunrpc/nfsd_debug :

/var/log/messages:
Jan 12 13:02:35 server kernel: NFSD: laundromat service - starting
Jan 12 13:02:35 server kernel: NFSD: laundromat_main - sleeping for 90 seconds
Jan 12 13:02:40 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 12 13:02:40 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 12 13:02:40 server kernel: nfsd_dispatch: vers 4 proc 1
Jan 12 13:02:40 server kernel: nfsv4 compound op #1: 24
Jan 12 13:02:40 server kernel: nfsd: Dropping request due to malloc failure!


And if rpc debugging is enabled:

Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 busy, not enqueued
Jan 11 18:50:48 server kernel: svc: calling dispatcher
Jan 11 18:50:48 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580 sendto([ffff8102273eb000 28... ], 28) = 28 (addr cbc916ac) Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 busy, not enqueued Jan 11 18:50:48 server kernel: svc: server ffff81027d33d000 waiting for data (to = 900000) Jan 11 18:50:48 server kernel: svc: server ffff81025e481400, socket ffff810127c28880, inuse=1 Jan 11 18:50:48 server kernel: svc: tcp_recv ffff810127c28880 data 0 conn 1 close 0 Jan 11 18:50:48 server kernel: svc: tcp_accept ffff810127c28880 sock ffff81029821b340
Jan 11 18:50:48 server kernel: svc: tcp_accept ffff81027c7f08c0 allocated
Jan 11 18:50:48 server kernel: svc: got len=0
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400 waiting for data (to = 900000) Jan 11 18:50:48 server kernel: svc: server ffff810251911800, socket ffff81029c53a580, inuse=1 Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 1 conn 0 close 0 Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580 recvfrom(ffff81029c53a628, 4) = -11
Jan 11 18:50:48 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 11 18:50:48 server kernel: svc: got len=-11
Jan 11 18:50:48 server kernel: svc: server ffff810251911800 waiting for data (to = 900000) Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 TCP data ready (svsk ffff81029c53a580) Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 served by daemon ffff810251911800 Jan 11 18:50:48 server kernel: svc: server ffff810251911800, socket ffff81029c53a580, inuse=1 Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 1 conn 0 close 0 Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580 recvfrom(ffff81029c53a62c, 0) = 4
Jan 11 18:50:48 server kernel: svc: TCP record, 144 bytes
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580 recvfrom(ffff81011f93a090, 3952) = 144
Jan 11 18:50:48 server kernel: svc: TCP complete record (144 bytes)
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 served by daemon ffff81025e481400
Jan 11 18:50:48 server kernel: svc: got len=144
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400, socket ffff81029c53a580, inuse=2
Jan 11 18:50:48 server kernel: svc: svc_authenticate (1)
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 1 conn 0 close 0 Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580 recvfrom(ffff81029c53a628, 4) = -11
Jan 11 18:50:48 server kernel: Want update, refage=120, age=0
Jan 11 18:50:48 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 11 18:50:48 server kernel: svc: got len=-11
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400 waiting for data (to = 900000)
Jan 11 18:50:48 server kernel: svc: svc_process dropit
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580 dropped request
Jan 11 18:50:48 server kernel: svc: server ffff810251911800 waiting for data (to = 900000)
Jan 11 18:50:48 server kernel: revisit queued
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 served by daemon ffff810251911800 Jan 11 18:50:48 server kernel: svc: server ffff810251911800, socket ffff81029c53a580, inuse=1 Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 0 conn 0 close 0 Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 served by daemon ffff81025e481400
Jan 11 18:50:48 server kernel: svc: got len=144
Jan 11 18:50:48 server kernel: svc: svc_authenticate (1)
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400, socket ffff81029c53a580, inuse=2
Jan 11 18:50:48 server kernel: svc: calling dispatcher
Jan 11 18:50:48 server kernel: nfsd_dispatch: vers 4 proc 1
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 0 conn 0 close 0 Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580 recvfrom(ffff81029c53a628, 4) = -11
Jan 11 18:50:48 server kernel: nfsv4 compound op #1: 24
Jan 11 18:50:48 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 11 18:50:48 server kernel: svc: got len=-11
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400 waiting for data (to = 900000)
Jan 11 18:50:48 server kernel: Want update, refage=120, age=0
Jan 11 18:50:48 server kernel: nfsd: Dropping request due to malloc failure!
Jan 11 18:50:48 server kernel: svc: svc_process dropit
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580 dropped request
Jan 11 18:50:48 server kernel: svc: server ffff810251911800 waiting for data (to = 900000)



Additionally, I have checked tcpdump and found, when mounting an NFS4 drive from a working storage-system:
...
12:38:06.372303 IP client.907 > storage.nfs: . ack 29 win 46 <nop,nop,timestamp 4063464822 174132214>
12:38:06.372429 IP client.2364980656 > storage.nfs: 148 getattr [|nfs]
12:38:06.372792 IP storage.nfs > client.2364980656: reply ok 248 getattr [|nfs]
12:38:06.372958 IP client.2381757872 > storage.nfs: 172 getattr [|nfs]
12:38:06.373132 IP storage.nfs > client.2381757872: reply ok 88 getattr [|nfs]
12:38:06.373157 IP client.2398535088 > storage.nfs: 176 getattr [|nfs]
12:38:06.373316 IP storage.nfs > client.2398535088: reply ok 100 getattr [|nfs]
12:38:06.373339 IP client.2415312304 > storage.nfs: 172 getattr [|nfs]


But when I mount from the same client, the NFS4 share from my server gets stuck on the "getattr" call
...
12:36:37.051840 IP client.926 > server.nfs: . ack 29 win 140 <nop,nop,timestamp 4063375488 434039929>
12:36:37.051903 IP client.1734362088 > server.nfs: 148 getattr [|nfs]
12:36:37.090274 IP server.nfs > client.926: . ack 192 win 4742 <nop,nop,timestamp 434039939 4063375488>
---silence---



So I suppose that the "RPC: TCP recvfrom got EAGAIN" on the messages log corresponds to that "getattr[|nfs]" call.

I have been searching around and I have found several threads about either the "malloc failure" message or the "EAGAIN" message. But I haven't found anything concerning them both at the same time. I have also checked for this kind of problems in NFS4 and found nothing useful.

May this be some kind of (already solved) bug in my nfs implementation? I'm running a pretty old version (SuSE LES 10.2, nfs-utils 1.0.7-36.2)

Thanks in advance,

Txema




My configuration files:

# cat /etc/sysconfig/nfs
USE_KERNEL_NFSD_NUMBER="4"
MOUNTD_PORT="980"
NFS_SECURITY_GSS="no"
NFS4_SUPPORT="yes"

# cat /etc/exports
#TESTS NFS 4
/NFS4exports 172.22.201.0/24(rw,sync,no_root_squash,no_subtree_check,secure,secure_locks,acl,mp,fsid=0,no_wdelay) /NFS4exports 172.22.202.0/24(rw,sync,no_root_squash,no_subtree_check,secure,secure_locks,acl,mp,fsid=0,no_wdelay) /NFS4exports/NFS_test_params 172.22.201.0/24(rw,sync,no_root_squash,no_subtree_check,secure,hide,secure_locks,acl,mp,fsid=255,no_wdelay) /NFS4exports/NFS_test_params 172.22.202.0/24(rw,sync,no_root_squash,no_subtree_check,secure,hide,secure_locks,acl,mp,fsid=255,no_wdelay)
#NFS 3 working ones
#/sp/fs/NFS_test_params 172.22.201.0/24(rw,sync,no_root_squash,no_subtree_check,secure,hide,secure_locks,acl,mp,fsid=255,no_wdelay) 172.22.202.0/24(rw,sync,no_root_squash,no_subtree_check,secure,hide,secure_locks,acl,mp,fsid=255,no_wdelay)

# cat /etc/idmapd.conf
[General]
Verbosity=1
Pipefs-Directory=/var/lib/rpc_pipefs
Domain=mydomain
[Mapping]
Nobody-User=nobody
Nobody-Group=nobody

PS: This is my very first post on this list, so I'm not sure if I am missing something.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux