constant "failed to submit message" to rpc-transport

Michael Brown <michael@xxxxxxxxxxxx> · Sun, 05 May 2013 02:22:37 -0400



    After a bit of load, I constantly find my gluster server getting
    into a state where it seems to be unable to reply to NFS RPCs:

    
    [2013-05-05 01:31:16.421507] E [rpcsvc.c:1080:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3705786983x, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server)
[2013-05-05 01:31:16.421528] E [nfs3.c:627:nfs3svc_submit_vector_reply] 0-nfs-nfsv3: Reply submission failed
    http://pastie.org/7803022

    
    Any idea what to do about it?

    
    The NFS daemon also gets rather large - I suspect it's storing up
    data for all these RPCs:

      PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+ 
      Command

    30674 root       20   0 37.6G 37.3G  2288 R 99.0 29.6 
      5:29.88 /usr/local/glusterfs/sbin/glusterfs

    
    I'm running 3.3.1 with a few patches:
    https://github.com/Supermathie/glusterfs/tree/release-3.3-oracle

    
    Workload is usually Oracle DNFS.

    
    M.

    -- 
Michael Brown               | `One of the main causes of the fall of
Systems Consultant          | the Roman Empire was that, lacking zero,
Net Direct Inc.             | they had no way to indicate successful
☎: +1 519 883 1172 x5106    | termination of their C programs.' - Firth