On 12/22/2015 09:10 PM, David Robinson
wrote:
Pranith,
This issue continues to happen. If you could provide
instructions for getting you the statedump, I would be happy to
send that information.
I am not sure how to get a statedump just before the crash as
the crash is intermittent.
Command: gluster volume statedump <volname>
This generates statedump files in /var/run/gluster/ directory. Do
you think you can execute this command once every 'X' time until the
crash is hit? Post these files and hopefully that should be good
enough to fix the problem.
Pranith
David
------ Original Message ------
Sent: 12/21/2015 11:59:33 PM
Subject: Re: glusterfsd crash due to page
allocation failure
hi Glomski,
This is the second time I am hearing about memory
allocation problems in 3.7.6 but this time on brick side. Are
you able to recreate this issue? Will it be possible to get
statedumps of the bricks processes just before they crash?
Pranith
On 12/22/2015 02:25 AM, Glomski,
Patrick wrote:
Hello,
We've recently upgraded from gluster 3.6.6 to 3.7.6 and
have started encountering dmesg page allocation errors
(stack trace is appended).
It appears that glusterfsd now sometimes fills up the
cache completely and crashes with a page allocation
failure. I *believe* it mainly happens when copying lots
of new data to the system, running a 'find', or similar.
Hosts are all Scientific Linux 6.6 and these errors
occur consistently on two separate gluster pools.
Has anyone else seen this issue and are there any known
fixes for it via sysctl kernel parameters or other
means?
Please let me know of any other diagnostic
information that would help.
Thanks,
Patrick
[1458118.134697] glusterfsd: page
allocation failure. order:5, mode:0x20
[1458118.134701] Pid: 6010, comm: glusterfsd Not
tainted 2.6.32-573.3.1.el6.x86_64 #1
[1458118.134702] Call Trace:
[1458118.134714] [<ffffffff8113770c>] ?
__alloc_pages_nodemask+0x7dc/0x950
[1458118.134728] [<ffffffffa0321800>] ?
mlx4_ib_post_send+0x680/0x1f90 [mlx4_ib]
[1458118.134733] [<ffffffff81176e92>] ?
kmem_getpages+0x62/0x170
[1458118.134735] [<ffffffff81177aaa>] ?
fallback_alloc+0x1ba/0x270
[1458118.134736] [<ffffffff811774ff>] ?
cache_grow+0x2cf/0x320
[1458118.134738] [<ffffffff81177829>] ?
____cache_alloc_node+0x99/0x160
[1458118.134743] [<ffffffff8145f732>] ?
pskb_expand_head+0x62/0x280
[1458118.134744] [<ffffffff81178479>] ?
__kmalloc+0x199/0x230
[1458118.134746] [<ffffffff8145f732>] ?
pskb_expand_head+0x62/0x280
[1458118.134748] [<ffffffff8146001a>] ?
__pskb_pull_tail+0x2aa/0x360
[1458118.134751] [<ffffffff8146f389>] ?
harmonize_features+0x29/0x70
[1458118.134753] [<ffffffff8146f9f4>] ?
dev_hard_start_xmit+0x1c4/0x490
[1458118.134758] [<ffffffff8148cf8a>] ?
sch_direct_xmit+0x15a/0x1c0
[1458118.134759] [<ffffffff8146ff68>] ?
dev_queue_xmit+0x228/0x320
[1458118.134762] [<ffffffff8147665d>] ?
neigh_connected_output+0xbd/0x100
[1458118.134766] [<ffffffff814abc67>] ?
ip_finish_output+0x287/0x360
[1458118.134767] [<ffffffff814abdf8>] ?
ip_output+0xb8/0xc0
[1458118.134769] [<ffffffff814ab04f>] ?
__ip_local_out+0x9f/0xb0
[1458118.134770] [<ffffffff814ab085>] ?
ip_local_out+0x25/0x30
[1458118.134772] [<ffffffff814ab580>] ?
ip_queue_xmit+0x190/0x420
[1458118.134773] [<ffffffff81137059>] ?
__alloc_pages_nodemask+0x129/0x950
[1458118.134776] [<ffffffff814c0c54>] ?
tcp_transmit_skb+0x4b4/0x8b0
[1458118.134778] [<ffffffff814c319a>] ?
tcp_write_xmit+0x1da/0xa90
[1458118.134779] [<ffffffff81178cbd>] ?
__kmalloc_node+0x4d/0x60
[1458118.134780] [<ffffffff814c3a80>] ?
tcp_push_one+0x30/0x40
[1458118.134782] [<ffffffff814b410c>] ?
tcp_sendmsg+0x9cc/0xa20
[1458118.134786] [<ffffffff8145836b>] ?
sock_aio_write+0x19b/0x1c0
[1458118.134788] [<ffffffff814581d0>] ?
sock_aio_write+0x0/0x1c0
[1458118.134791] [<ffffffff8119169b>] ?
do_sync_readv_writev+0xfb/0x140
[1458118.134797] [<ffffffff810a14b0>] ?
autoremove_wake_function+0x0/0x40
[1458118.134801] [<ffffffff8123e92f>] ?
selinux_file_permission+0xbf/0x150
[1458118.134804] [<ffffffff812316d6>] ?
security_file_permission+0x16/0x20
[1458118.134806] [<ffffffff81192746>] ?
do_readv_writev+0xd6/0x1f0
[1458118.134807] [<ffffffff811928a6>] ?
vfs_writev+0x46/0x60
[1458118.134809] [<ffffffff811929d1>] ?
sys_writev+0x51/0xd0
[1458118.134812] [<ffffffff810e88ae>] ?
__audit_syscall_exit+0x25e/0x290
[1458118.134816] [<ffffffff8100b0d2>] ?
system_call_fastpath+0x16/0x1b
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
|
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel