I created a 8x1x2 distributed-replicated volume and fired up Oracle with Direct NFS enabled, then fired up a load generator. It went quite well for a while then suddenly crashed. Nothing fancy, just a lot of load. M. On 13-05-02 12:33 AM, Pranith Kumar Karampuri wrote: > Michael, > Could you let us know the steps to re-create the issue. > > Pranith > > ----- Original Message ----- >> From: "Michael Brown" <michael@xxxxxxxxxxxx> >> To: gluster-devel@xxxxxxxxxx >> Sent: Wednesday, May 1, 2013 10:57:59 PM >> Subject: v3.4.0a3+ NFS crashing out >> >> My gluster NFS daemon is crashing with the following: >> >> pending frames: >> <<<25592 copies of>>> >> frame : type(0) op(0) >> >> patchset: git://git.gluster.com/glusterfs.git >> signal received: 11 >> time of crash: 2013-05-01 17:02:36configuration details: >> argp 1 >> backtrace 1 >> dlfcn 1 >> fdatasync 1 >> libpthread 1 >> llistxattr 1 >> setfsid 1 >> spinlock 1 >> epoll.h 1 >> xattr.h 1 >> st_atim.tv_nsec 1 >> package-string: glusterfs 3.4git >> /usr/local/glusterfs/sbin/glusterfs(glusterfsd_print_trace+0x1f)[0x407bd5] >> /lib64/libc.so.6[0x3c48c32920] >> /lib64/libc.so.6[0x3c48c7870a] >> /usr/local/glusterfs/lib/libglusterfs.so.0(__gf_free+0x61)[0x7f80421665a9] >> /usr/local/glusterfs/lib/libglusterfs.so.0(mem_put+0x212)[0x7f8042166fd8] >> /usr/local/glusterfs/lib/glusterfs/3.4git/xlator/cluster/replicate.so(afr_writev_done+0xca)[0x7f803d8cf9ec] >> /usr/local/glusterfs/lib/glusterfs/3.4git/xlator/cluster/replicate.so(+0x58d7f)[0x7f803d900d7f] >> /usr/local/glusterfs/lib/glusterfs/3.4git/xlator/cluster/replicate.so(+0x58f09)[0x7f803d900f09] >> /usr/local/glusterfs/lib/glusterfs/3.4git/xlator/cluster/replicate.so(+0x59214)[0x7f803d901214] >> /usr/local/glusterfs/lib/glusterfs/3.4git/xlator/cluster/replicate.so(afr_unlock+0x57)[0x7f803d905aeb] >> /usr/local/glusterfs/lib/glusterfs/3.4git/xlator/cluster/replicate.so(afr_changelog_post_op_cbk+0x10a)[0x7f803d8dd01f] >> /usr/local/glusterfs/lib/glusterfs/3.4git/xlator/cluster/replicate.so(afr_changelog_post_op_now+0x8c7)[0x7f803d8ddebf] >> /usr/local/glusterfs/lib/glusterfs/3.4git/xlator/cluster/replicate.so(afr_delayed_changelog_post_op+0x16e)[0x7f803d8e1f36] >> /usr/local/glusterfs/lib/glusterfs/3.4git/xlator/cluster/replicate.so(afr_changelog_post_op+0x59)[0x7f803d8e1f99] >> /usr/local/glusterfs/lib/glusterfs/3.4git/xlator/cluster/replicate.so(afr_transaction_resume+0x87)[0x7f803d8e205e] >> /usr/local/glusterfs/lib/glusterfs/3.4git/xlator/cluster/replicate.so(afr_writev_wind_cbk+0x348)[0x7f803d8cf468] >> /usr/local/glusterfs/lib/glusterfs/3.4git/xlator/protocol/client.so(client3_3_writev_cbk+0x490)[0x7f803db53397] >> /usr/local/glusterfs/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x1b5)[0x7f8041f14759] >> /usr/local/glusterfs/lib/libgfrpc.so.0(rpc_clnt_notify+0x2d3)[0x7f8041f14af0] >> /usr/local/glusterfs/lib/libgfrpc.so.0(rpc_transport_notify+0x110)[0x7f8041f1118c] >> /usr/local/glusterfs/lib/glusterfs/3.4git/rpc-transport/socket.so(socket_event_poll_in+0x54)[0x7f803e9a40a9] >> /usr/local/glusterfs/lib/glusterfs/3.4git/rpc-transport/socket.so(socket_event_handler+0x1c4)[0x7f803e9a4558] >> /usr/local/glusterfs/lib/libglusterfs.so.0(+0x72441)[0x7f8042190441] >> /usr/local/glusterfs/lib/libglusterfs.so.0(+0x72630)[0x7f8042190630] >> /usr/local/glusterfs/lib/libglusterfs.so.0(event_dispatch+0x6c)[0x7f8042165af3] >> /usr/local/glusterfs/sbin/glusterfs(main+0x2c7)[0x408503] >> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3c48c1ecdd] >> /usr/local/glusterfs/sbin/glusterfs[0x404649] >> --------- >> >> It rather looks like the nfs code isn't freeing up NULL frames from the >> frame stack (if those words are right :D) when it's done replying to them. >> >> Yes, Oracle does send quite a few. Up until that, it was behaving REALLY >> well :) >> >> M. >> >> -- >> Michael Brown | `One of the main causes of the fall of >> Systems Consultant | the Roman Empire was that, lacking zero, >> Net Direct Inc. | they had no way to indicate successful >> ☎: +1 519 883 1172 x5106 | termination of their C programs.' - Firth >> >> >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@xxxxxxxxxx >> https://lists.nongnu.org/mailman/listinfo/gluster-devel >> -- Michael Brown | `One of the main causes of the fall of Systems Consultant | the Roman Empire was that, lacking zero, Net Direct Inc. | they had no way to indicate successful ☎: +1 519 883 1172 x5106 | termination of their C programs.' - Firth