We are running kernel 2.6.18-164.6.1.el5 with exporting 3 aoe provided ext4 directories. For a couple of weeks we had a small number of users using the system with no issues, today we added 7 users and the system crashed and did not perform correctly since. Nov 23 10:20:03 sulphur rpc.idmapd[5199]: nfsdcb: id '-2' too big! Nov 23 10:42:25 sulphur nfsd[27306]: nfssvc: Setting version failed: errno 16 (Device or resource busy) Nov 23 10:42:25 sulphur nfsd[27306]: nfssvc: unable to bind UPD socket: errno 98 (Address already in use) Nov 23 10:42:26 sulphur kernel: slab error in kmem_cache_destroy(): cache `nfsd4_files': Can't free all objects Nov 23 10:42:26 sulphur kernel: [<ffffffff88645efd>] :nfsd:nfsd4_free_slab+0x11/0x4d Nov 23 10:42:26 sulphur kernel: [<ffffffff88645f55>] :nfsd:nfsd4_free_slabs+0x1c/0x33 Nov 23 10:42:26 sulphur kernel: [<ffffffff88646ecb>] :nfsd:nfs4_state_shutdown+0x17e/0x18a Nov 23 10:42:26 sulphur kernel: [<ffffffff88630570>] :nfsd:nfsd_last_thread+0x45/0x76 Nov 23 10:42:26 sulphur kernel: [<ffffffff88630856>] :nfsd:nfsd+0x2b5/0x2cb Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb Nov 23 10:42:26 sulphur kernel: BUG: warning at fs/nfsd/nfs4state.c:1016/nfsd4_free_slab() (Tainted: G ) Nov 23 10:42:26 sulphur kernel: [<ffffffff88645f55>] :nfsd:nfsd4_free_slabs+0x1c/0x33 Nov 23 10:42:26 sulphur kernel: [<ffffffff88646ecb>] :nfsd:nfs4_state_shutdown+0x17e/0x18a Nov 23 10:42:26 sulphur kernel: [<ffffffff88630570>] :nfsd:nfsd_last_thread+0x45/0x76 Nov 23 10:42:26 sulphur kernel: [<ffffffff88630856>] :nfsd:nfsd+0x2b5/0x2cb Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb Nov 23 10:42:26 sulphur kernel: slab error in kmem_cache_destroy(): cache `nfsd4_delegations': Can't free all objects Nov 23 10:42:26 sulphur kernel: [<ffffffff88645efd>] :nfsd:nfsd4_free_slab+0x11/0x4d Nov 23 10:42:26 sulphur kernel: [<ffffffff88646ecb>] :nfsd:nfs4_state_shutdown+0x17e/0x18a Nov 23 10:42:26 sulphur kernel: [<ffffffff88630570>] :nfsd:nfsd_last_thread+0x45/0x76 Nov 23 10:42:26 sulphur kernel: [<ffffffff88630856>] :nfsd:nfsd+0x2b5/0x2cb Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb Nov 23 10:42:26 sulphur kernel: BUG: warning at fs/nfsd/nfs4state.c:1016/nfsd4_free_slab() (Tainted: G ) Nov 23 10:42:26 sulphur kernel: [<ffffffff88646ecb>] :nfsd:nfs4_state_shutdown+0x17e/0x18a Nov 23 10:42:26 sulphur kernel: [<ffffffff88630570>] :nfsd:nfsd_last_thread+0x45/0x76 Nov 23 10:42:26 sulphur kernel: [<ffffffff88630856>] :nfsd:nfsd+0x2b5/0x2cb Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb Nov 23 10:42:26 sulphur kernel: nfsd: last server has exited Nov 23 10:42:26 sulphur kernel: nfsd: unexporting all filesystems Nov 23 10:42:44 sulphur kernel: kmem_cache_create: duplicate cache nfsd4_files Nov 23 10:42:44 sulphur kernel: [<ffffffff88646f29>] :nfsd:nfs4_state_start+0x52/0x18f Nov 23 10:42:44 sulphur kernel: [<ffffffff886303ae>] :nfsd:nfsd_svc+0x6c/0x1e9 Nov 23 10:42:44 sulphur kernel: [<ffffffff88630f8e>] :nfsd:write_threads+0x0/0xa9 Nov 23 10:42:44 sulphur kernel: [<ffffffff88630ffd>] :nfsd:write_threads+0x6f/0xa9 Nov 23 10:42:44 sulphur kernel: [<ffffffff88630f8e>] :nfsd:write_threads+0x0/0xa9 Nov 23 10:42:44 sulphur kernel: [<ffffffff88630d59>] :nfsd:nfsctl_transaction_write+0x42/0x77Nov 23 10:42:44 sulphur nfsd[27369]: nfssvc: Cannot allocate memory Nov 23 10:43:55 sulphur nfsd[27495]: nfssvc: Setting version failed: errno 16 (Device or resource busy) Nov 23 10:43:55 sulphur nfsd[27495]: nfssvc: unable to bind UPD socket: errno 98 (Address already in use) So above shows the original problem and then me restarting it and eventually I had to reboot the server. Since then it has been behaving bizarrely with it running for 5 mins and then stopping, upon a restart it will run for a while and then stop. Nov 23 11:04:46 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory Nov 23 11:17:02 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big! Nov 23 11:29:01 sulphur kernel: nfsd: last server has exited Nov 23 11:29:01 sulphur kernel: nfsd: unexporting all filesystems Nov 23 11:29:08 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory Nov 23 11:29:08 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big! Nov 23 11:32:03 sulphur kernel: nfsd: last server has exited Nov 23 11:32:03 sulphur kernel: nfsd: unexporting all filesystems Nov 23 11:32:34 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory Nov 23 11:32:34 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big! Nov 23 11:41:58 sulphur kernel: nfsd: last server has exited Nov 23 11:41:58 sulphur kernel: nfsd: unexporting all filesystems Nov 23 11:42:03 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory Nov 23 11:42:03 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big! Nov 23 11:47:20 sulphur kernel: nfsd: last server has exited Nov 23 11:47:20 sulphur kernel: nfsd: unexporting all filesystems I haven't found a report of an issues for the "nfsdcb: id '-2' too big!" message but equally I don't know what it means either. On the console we are seeing loads of these messages:- kernel: NFSD: preprocess_seqid_op: magic stateid! Again I don't know what this means or the implications of this message. Any suggestions would be welcome. At the moment we are up with two users migrated back to the old servers. Thanks Phil. _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos