This is the core file from the crash just now [root at psanaoss213 /]# ls -al core* -rw------- 1 root root 4073594880 Jun 8 15:05 core.22682 From yesterday: [root at psanaoss214 /]# ls -al core* -rw------- 1 root root 4362727424 Jun 8 00:58 core.13483 -rw------- 1 root root 4624773120 Jun 8 03:21 core.8792 On 06/08/2012 04:34 PM, Anand Avati wrote: > Is it possible the system was running low on memory? I see you have > 48GB, but memory registration failure typically would be because the > system limit on the number of pinnable pages in RAM was hit. Can you > tell us the size of your core dump files after the crash? > > Avati > > On Fri, Jun 8, 2012 at 4:22 PM, Ling Ho <ling at slac.stanford.edu > <mailto:ling at slac.stanford.edu>> wrote: > > Hello, > > I have a brick that crashed twice today, and another different > brick that crashed just a while a go. > > This is what I see in one of the brick logs: > > patchset: git://git.gluster.com/glusterfs.git > <http://git.gluster.com/glusterfs.git> > patchset: git://git.gluster.com/glusterfs.git > <http://git.gluster.com/glusterfs.git> > signal received: 6 > signal received: 6 > time of crash: 2012-06-08 15:05:11 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > fdatasync 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 3.2.6 > /lib64/libc.so.6[0x34bc032900] > /lib64/libc.so.6(gsignal+0x35)[0x34bc032885] > /lib64/libc.so.6(abort+0x175)[0x34bc034065] > /lib64/libc.so.6[0x34bc06f977] > /lib64/libc.so.6[0x34bc075296] > /opt/glusterfs/3.2.6/lib64/libglusterfs.so.0(__gf_free+0x44)[0x7f1740ba25e4] > /opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_destroy+0x47)[0x7f1740956967] > /opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_unref+0x62)[0x7f1740956a32] > /opt/glusterfs/3.2.6/lib64/glusterfs/3.2.6/rpc-transport/rdma.so(+0xc135)[0x7f173ca27135] > /lib64/libpthread.so.0[0x34bc8077f1] > /lib64/libc.so.6(clone+0x6d)[0x34bc0e5ccd] > --------- > > And somewhere before these, there is also > [2012-06-08 15:05:07.512604] E [rdma.c:198:rdma_new_post] > 0-rpc-transport/rdma: memory registration failed > > I have 48GB of memory on the system: > > # free > total used free shared buffers > cached > Mem: 49416716 34496648 14920068 0 31692 > 28209612 > -/+ buffers/cache: 6255344 43161372 > Swap: 4194296 1740 4192556 <tel:1740%20%20%20%204192556> > > # uname -a > Linux psanaoss213 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10 > 15:22:22 EST 2012 x86_64 x86_64 x86_64 GNU/Linux > > The server gluster versions is 3.2.6-1. I am using have both rdma > clients and tcp clients over 10Gb/s network. > > Any suggestion what I should look for? > > Is there a way to just restart the brick, and not glusterd on the > server? I have 8 bricks on the server. > > Thanks, > ... > ling > > > Here's the volume info: > > # gluster volume info > > Volume Name: ana12 > Type: Distribute > Status: Started > Number of Bricks: 40 > Transport-type: tcp,rdma > Bricks: > Brick1: psanaoss214:/brick1 > Brick2: psanaoss214:/brick2 > Brick3: psanaoss214:/brick3 > Brick4: psanaoss214:/brick4 > Brick5: psanaoss214:/brick5 > Brick6: psanaoss214:/brick6 > Brick7: psanaoss214:/brick7 > Brick8: psanaoss214:/brick8 > Brick9: psanaoss211:/brick1 > Brick10: psanaoss211:/brick2 > Brick11: psanaoss211:/brick3 > Brick12: psanaoss211:/brick4 > Brick13: psanaoss211:/brick5 > Brick14: psanaoss211:/brick6 > Brick15: psanaoss211:/brick7 > Brick16: psanaoss211:/brick8 > Brick17: psanaoss212:/brick1 > Brick18: psanaoss212:/brick2 > Brick19: psanaoss212:/brick3 > Brick20: psanaoss212:/brick4 > Brick21: psanaoss212:/brick5 > Brick22: psanaoss212:/brick6 > Brick23: psanaoss212:/brick7 > Brick24: psanaoss212:/brick8 > Brick25: psanaoss213:/brick1 > Brick26: psanaoss213:/brick2 > Brick27: psanaoss213:/brick3 > Brick28: psanaoss213:/brick4 > Brick29: psanaoss213:/brick5 > Brick30: psanaoss213:/brick6 > Brick31: psanaoss213:/brick7 > Brick32: psanaoss213:/brick8 > Brick33: psanaoss215:/brick1 > Brick34: psanaoss215:/brick2 > Brick35: psanaoss215:/brick4 > Brick36: psanaoss215:/brick5 > Brick37: psanaoss215:/brick7 > Brick38: psanaoss215:/brick8 > Brick39: psanaoss215:/brick3 > Brick40: psanaoss215:/brick6 > Options Reconfigured: > performance.io-thread-count: 16 > performance.write-behind-window-size: 16MB > performance.cache-size: 1GB > nfs.disable: on > performance.cache-refresh-timeout: 1 > network.ping-timeout: 42 > performance.cache-max-file-size: 1PB > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20120608/9562d07f/attachment-0001.htm>