Re: glusterfsd crash due to page allocation failure

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Tue, 22 Dec 2015 10:29:33 +0530



    hi Glomski,

            This is the second time I am hearing about memory allocation
    problems in 3.7.6 but this time on brick side. Are you able to
    recreate this issue? Will it be possible to get statedumps of the
    bricks processes just before they crash?

    
    Pranith 

    
    On 12/22/2015 02:25 AM, Glomski,
      Patrick wrote:

    
          Hello,

            
          We've recently upgraded from gluster 3.6.6 to 3.7.6 and have
          started encountering dmesg page allocation errors (stack trace
          is appended). 

          
          It appears that glusterfsd now sometimes fills up the cache
          completely and crashes with a page allocation failure. I
          *believe* it mainly happens when copying lots of new data to
          the system, running a 'find', or similar. Hosts are all
          Scientific Linux 6.6 and these errors occur consistently on
          two separate gluster pools.

        
          Has anyone else seen this issue and are there any known fixes
          for it via sysctl kernel parameters or other means?

          
        Please let me know of any other diagnostic information that
          would help.

        
        Thanks,

        
        Patrick

        
          [1458118.134697]
            glusterfsd: page allocation failure. order:5, mode:0x20

            [1458118.134701] Pid: 6010, comm: glusterfsd Not tainted
            2.6.32-573.3.1.el6.x86_64 #1

            [1458118.134702] Call Trace:

            [1458118.134714]  [<ffffffff8113770c>] ?
            __alloc_pages_nodemask+0x7dc/0x950

            [1458118.134728]  [<ffffffffa0321800>] ?
            mlx4_ib_post_send+0x680/0x1f90 [mlx4_ib]

            [1458118.134733]  [<ffffffff81176e92>] ?
            kmem_getpages+0x62/0x170

            [1458118.134735]  [<ffffffff81177aaa>] ?
            fallback_alloc+0x1ba/0x270

            [1458118.134736]  [<ffffffff811774ff>] ?
            cache_grow+0x2cf/0x320

            [1458118.134738]  [<ffffffff81177829>] ?
            ____cache_alloc_node+0x99/0x160

            [1458118.134743]  [<ffffffff8145f732>] ?
            pskb_expand_head+0x62/0x280

            [1458118.134744]  [<ffffffff81178479>] ?
            __kmalloc+0x199/0x230

            [1458118.134746]  [<ffffffff8145f732>] ?
            pskb_expand_head+0x62/0x280

            [1458118.134748]  [<ffffffff8146001a>] ?
            __pskb_pull_tail+0x2aa/0x360

            [1458118.134751]  [<ffffffff8146f389>] ?
            harmonize_features+0x29/0x70

            [1458118.134753]  [<ffffffff8146f9f4>] ?
            dev_hard_start_xmit+0x1c4/0x490

            [1458118.134758]  [<ffffffff8148cf8a>] ?
            sch_direct_xmit+0x15a/0x1c0

            [1458118.134759]  [<ffffffff8146ff68>] ?
            dev_queue_xmit+0x228/0x320

            [1458118.134762]  [<ffffffff8147665d>] ?
            neigh_connected_output+0xbd/0x100

            [1458118.134766]  [<ffffffff814abc67>] ?
            ip_finish_output+0x287/0x360

            [1458118.134767]  [<ffffffff814abdf8>] ?
            ip_output+0xb8/0xc0

            [1458118.134769]  [<ffffffff814ab04f>] ?
            __ip_local_out+0x9f/0xb0

            [1458118.134770]  [<ffffffff814ab085>] ?
            ip_local_out+0x25/0x30

            [1458118.134772]  [<ffffffff814ab580>] ?
            ip_queue_xmit+0x190/0x420

            [1458118.134773]  [<ffffffff81137059>] ?
            __alloc_pages_nodemask+0x129/0x950

            [1458118.134776]  [<ffffffff814c0c54>] ?
            tcp_transmit_skb+0x4b4/0x8b0

            [1458118.134778]  [<ffffffff814c319a>] ?
            tcp_write_xmit+0x1da/0xa90

            [1458118.134779]  [<ffffffff81178cbd>] ?
            __kmalloc_node+0x4d/0x60

            [1458118.134780]  [<ffffffff814c3a80>] ?
            tcp_push_one+0x30/0x40

            [1458118.134782]  [<ffffffff814b410c>] ?
            tcp_sendmsg+0x9cc/0xa20

            [1458118.134786]  [<ffffffff8145836b>] ?
            sock_aio_write+0x19b/0x1c0

            [1458118.134788]  [<ffffffff814581d0>] ?
            sock_aio_write+0x0/0x1c0

            [1458118.134791]  [<ffffffff8119169b>] ?
            do_sync_readv_writev+0xfb/0x140

            [1458118.134797]  [<ffffffff810a14b0>] ?
            autoremove_wake_function+0x0/0x40

            [1458118.134801]  [<ffffffff8123e92f>] ?
            selinux_file_permission+0xbf/0x150

            [1458118.134804]  [<ffffffff812316d6>] ?
            security_file_permission+0x16/0x20

            [1458118.134806]  [<ffffffff81192746>] ?
            do_readv_writev+0xd6/0x1f0

            [1458118.134807]  [<ffffffff811928a6>] ?
            vfs_writev+0x46/0x60

            [1458118.134809]  [<ffffffff811929d1>] ?
            sys_writev+0x51/0xd0

            [1458118.134812]  [<ffffffff810e88ae>] ?
            __audit_syscall_exit+0x25e/0x290

            [1458118.134816]  [<ffffffff8100b0d2>] ?
            system_call_fastpath+0x16/0x1b

          
      _______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
    
    
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel