Re: [Gluster-devel] glusterfsd crash due to page allocation failure

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Mon, 28 Dec 2015 10:29:00 +0530

After debugging with David, we found that the issue is already fixed for 
3.7.7 by the patch http://review.gluster.org/12312

Pranith

On 12/22/2015 10:45 PM, David Robinson wrote:
Niels,

> 1. how is infiniband involved/configured in this environment?

gfsib01bkp and gfs02bkp are connected via infiniband. We are using tcp 
transport as I never was able to get RDMA to work.

Volume Name: gfsbackup
Type: Distribute
Volume ID: e78d5123-d9bc-4d88-9c73-61d28abf0b41
Status: Started
Number of Bricks: 7
Transport-type: tcp
Bricks:
Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/gfsbackup
Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/gfsbackup
Brick3: gfsib02bkp.corvidtec.com:/data/brick01bkp/gfsbackup
Brick4: gfsib02bkp.corvidtec.com:/data/brick02bkp/gfsbackup
Brick5: gfsib02bkp.corvidtec.com:/data/brick03bkp/gfsbackup
Brick6: gfsib02bkp.corvidtec.com:/data/brick04bkp/gfsbackup
Brick7: gfsib02bkp.corvidtec.com:/data/brick05bkp/gfsbackup

> 2. was there a change/update of the driver (kernel update maybe?)
Before upgrading these servers from gluster 3.6.6 to 3.7.6, I did a 
'yum update' which did upgrade the kernel.
Current kernel is 2.6.32-573.12.1.el6.x86_64

> 3. do you get a coredump of the glusterfsd process when this happens?
There are a series of core files in / around the same time that this 
happens.
-rw-------    1 root root  168865792 Dec 22 10:45 core.3700
-rw-------    1 root root  168861696 Dec 22 10:45 core.3661
-rw-------    1 root root  168861696 Dec 22 10:45 core.3706
-rw-------    1 root root  168861696 Dec 22 10:45 core.3677
-rw-------    1 root root  168861696 Dec 22 10:45 core.3669
-rw-------    1 root root  168857600 Dec 22 10:45 core.3654
-rw-------    1 root root  254345216 Dec 22 10:45 core.3693
-rw-------    1 root root  254341120 Dec 22 10:45 core.3685

> 4. is this a fuse mount process, or a brick process? (check by PID?)
I have rebooted the machine as it was in a bad state and I could no 
longer write to the gluster volume.
When it does it again, I will check the PID.

This machine has both brick processses and fuse mounts.  The storage 
servers mount the volume through a fuse mount and then I use rsync to 
backup my primary storage system.

David

 Hello,

 We've recently upgraded from gluster 3.6.6 to 3.7.6 and have started
 encountering dmesg page allocation errors (stack trace is appended).

 It appears that glusterfsd now sometimes fills up the cache 
completely and
 crashes with a page allocation failure. I *believe* it mainly 
happens when
 copying lots of new data to the system, running a 'find', or 
similar. Hosts
 are all Scientific Linux 6.6 and these errors occur consistently on 
two
 separate gluster pools.

 Has anyone else seen this issue and are there any known fixes for 
it via
 sysctl kernel parameters or other means?

 Please let me know of any other diagnostic information that would 
help.

Could you explain a little more about this? The below is a message from
the kernel telling you that the mlx4_ib (Mellanox Infiniband?) driver is
requesting more continuous memory than is immediately available.

So, the questions I have regarding this:

1. how is infiniband involved/configured in this environment?
2. was there a change/update of the driver (kernel update maybe?)
3. do you get a coredump of the glusterfsd process when this happens?
4. is this a fuse mount process, or a brick process? (check by PID?)

Thanks,
Niels

 Thanks,
 Patrick

 [1458118.134697] glusterfsd: page allocation failure. order:5, 
mode:0x20
 > [1458118.134701] Pid: 6010, comm: glusterfsd Not tainted
 > 2.6.32-573.3.1.el6.x86_64 #1
 > [1458118.134702] Call Trace:
 > [1458118.134714]  [<ffffffff8113770c>] ? 
__alloc_pages_nodemask+0x7dc/0x950
 > [1458118.134728]  [<ffffffffa0321800>] ? 
mlx4_ib_post_send+0x680/0x1f90
 > [mlx4_ib]
 > [1458118.134733]  [<ffffffff81176e92>] ? kmem_getpages+0x62/0x170
 > [1458118.134735]  [<ffffffff81177aaa>] ? fallback_alloc+0x1ba/0x270
 > [1458118.134736]  [<ffffffff811774ff>] ? cache_grow+0x2cf/0x320
 > [1458118.134738]  [<ffffffff81177829>] ? 
____cache_alloc_node+0x99/0x160
 > [1458118.134743]  [<ffffffff8145f732>] ? pskb_expand_head+0x62/0x280
 > [1458118.134744]  [<ffffffff81178479>] ? __kmalloc+0x199/0x230
 > [1458118.134746]  [<ffffffff8145f732>] ? pskb_expand_head+0x62/0x280
 > [1458118.134748]  [<ffffffff8146001a>] ? 
__pskb_pull_tail+0x2aa/0x360
 > [1458118.134751]  [<ffffffff8146f389>] ? 
harmonize_features+0x29/0x70
 > [1458118.134753]  [<ffffffff8146f9f4>] ? 
dev_hard_start_xmit+0x1c4/0x490
 > [1458118.134758]  [<ffffffff8148cf8a>] ? sch_direct_xmit+0x15a/0x1c0
 > [1458118.134759]  [<ffffffff8146ff68>] ? dev_queue_xmit+0x228/0x320
 > [1458118.134762]  [<ffffffff8147665d>] ? 
neigh_connected_output+0xbd/0x100
 > [1458118.134766]  [<ffffffff814abc67>] ? 
ip_finish_output+0x287/0x360
 > [1458118.134767]  [<ffffffff814abdf8>] ? ip_output+0xb8/0xc0
 > [1458118.134769]  [<ffffffff814ab04f>] ? __ip_local_out+0x9f/0xb0
 > [1458118.134770]  [<ffffffff814ab085>] ? ip_local_out+0x25/0x30
 > [1458118.134772]  [<ffffffff814ab580>] ? ip_queue_xmit+0x190/0x420
 > [1458118.134773]  [<ffffffff81137059>] ? 
__alloc_pages_nodemask+0x129/0x950
 > [1458118.134776]  [<ffffffff814c0c54>] ? 
tcp_transmit_skb+0x4b4/0x8b0
 > [1458118.134778]  [<ffffffff814c319a>] ? tcp_write_xmit+0x1da/0xa90
 > [1458118.134779]  [<ffffffff81178cbd>] ? __kmalloc_node+0x4d/0x60
 > [1458118.134780]  [<ffffffff814c3a80>] ? tcp_push_one+0x30/0x40
 > [1458118.134782]  [<ffffffff814b410c>] ? tcp_sendmsg+0x9cc/0xa20
 > [1458118.134786]  [<ffffffff8145836b>] ? sock_aio_write+0x19b/0x1c0
 > [1458118.134788]  [<ffffffff814581d0>] ? sock_aio_write+0x0/0x1c0
 > [1458118.134791]  [<ffffffff8119169b>] ? 
do_sync_readv_writev+0xfb/0x140
 > [1458118.134797]  [<ffffffff810a14b0>] ? 
autoremove_wake_function+0x0/0x40
 > [1458118.134801]  [<ffffffff8123e92f>] ? 
selinux_file_permission+0xbf/0x150
 > [1458118.134804]  [<ffffffff812316d6>] ? 
security_file_permission+0x16/0x20
 > [1458118.134806]  [<ffffffff81192746>] ? do_readv_writev+0xd6/0x1f0
 > [1458118.134807]  [<ffffffff811928a6>] ? vfs_writev+0x46/0x60
 > [1458118.134809]  [<ffffffff811929d1>] ? sys_writev+0x51/0xd0
 > [1458118.134812]  [<ffffffff810e88ae>] ? 
__audit_syscall_exit+0x25e/0x290
 > [1458118.134816]  [<ffffffff8100b0d2>] ? 
system_call_fastpath+0x16/0x1b
 >

 _______________________________________________
 Gluster-devel mailing list
 Gluster-devel@xxxxxxxxxxx
 http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users