Brick crashes

ling at slac.stanford.edu (Ling Ho) · Fri, 08 Jun 2012 16:41:12 -0700



This is the core file from the crash just now

[root at psanaoss213 /]# ls -al core*
-rw------- 1 root root 4073594880 Jun  8 15:05 core.22682

 From yesterday:
[root at psanaoss214 /]# ls -al core*
-rw------- 1 root root 4362727424 Jun  8 00:58 core.13483
-rw------- 1 root root 4624773120 Jun  8 03:21 core.8792


On 06/08/2012 04:34 PM, Anand Avati wrote:
> Is it possible the system was running low on memory? I see you have 
> 48GB, but memory registration failure typically would be because the 
> system limit on the number of pinnable pages in RAM was hit. Can you 
> tell us the size of your core dump files after the crash?
>
> Avati
>
> On Fri, Jun 8, 2012 at 4:22 PM, Ling Ho <ling at slac.stanford.edu 
> <mailto:ling at slac.stanford.edu>> wrote:
>
>     Hello,
>
>     I have a brick that crashed twice today, and another different
>     brick that crashed just a while a go.
>
>     This is what I see in one of the brick logs:
>
>     patchset: git://git.gluster.com/glusterfs.git
>     <http://git.gluster.com/glusterfs.git>
>     patchset: git://git.gluster.com/glusterfs.git
>     <http://git.gluster.com/glusterfs.git>
>     signal received: 6
>     signal received: 6
>     time of crash: 2012-06-08 15:05:11
>     configuration details:
>     argp 1
>     backtrace 1
>     dlfcn 1
>     fdatasync 1
>     libpthread 1
>     llistxattr 1
>     setfsid 1
>     spinlock 1
>     epoll.h 1
>     xattr.h 1
>     st_atim.tv_nsec 1
>     package-string: glusterfs 3.2.6
>     /lib64/libc.so.6[0x34bc032900]
>     /lib64/libc.so.6(gsignal+0x35)[0x34bc032885]
>     /lib64/libc.so.6(abort+0x175)[0x34bc034065]
>     /lib64/libc.so.6[0x34bc06f977]
>     /lib64/libc.so.6[0x34bc075296]
>     /opt/glusterfs/3.2.6/lib64/libglusterfs.so.0(__gf_free+0x44)[0x7f1740ba25e4]
>     /opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_destroy+0x47)[0x7f1740956967]
>     /opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_unref+0x62)[0x7f1740956a32]
>     /opt/glusterfs/3.2.6/lib64/glusterfs/3.2.6/rpc-transport/rdma.so(+0xc135)[0x7f173ca27135]
>     /lib64/libpthread.so.0[0x34bc8077f1]
>     /lib64/libc.so.6(clone+0x6d)[0x34bc0e5ccd]
>     ---------
>
>     And somewhere before these, there is also
>     [2012-06-08 15:05:07.512604] E [rdma.c:198:rdma_new_post]
>     0-rpc-transport/rdma: memory registration failed
>
>     I have 48GB of memory on the system:
>
>     # free
>                 total       used       free     shared    buffers    
>     cached
>     Mem:      49416716   34496648   14920068          0      31692  
>     28209612
>     -/+ buffers/cache:    6255344   43161372
>     Swap:      4194296 1740 4192556 <tel:1740%20%20%20%204192556>
>
>     # uname -a
>     Linux psanaoss213 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10
>     15:22:22 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
>
>     The server gluster versions is 3.2.6-1. I am using have both rdma
>     clients and tcp clients over 10Gb/s network.
>
>     Any suggestion what I should look for?
>
>     Is there a way to just restart the brick, and not glusterd on the
>     server? I have 8 bricks on the server.
>
>     Thanks,
>     ...
>     ling
>
>
>     Here's the volume info:
>
>     # gluster volume info
>
>     Volume Name: ana12
>     Type: Distribute
>     Status: Started
>     Number of Bricks: 40
>     Transport-type: tcp,rdma
>     Bricks:
>     Brick1: psanaoss214:/brick1
>     Brick2: psanaoss214:/brick2
>     Brick3: psanaoss214:/brick3
>     Brick4: psanaoss214:/brick4
>     Brick5: psanaoss214:/brick5
>     Brick6: psanaoss214:/brick6
>     Brick7: psanaoss214:/brick7
>     Brick8: psanaoss214:/brick8
>     Brick9: psanaoss211:/brick1
>     Brick10: psanaoss211:/brick2
>     Brick11: psanaoss211:/brick3
>     Brick12: psanaoss211:/brick4
>     Brick13: psanaoss211:/brick5
>     Brick14: psanaoss211:/brick6
>     Brick15: psanaoss211:/brick7
>     Brick16: psanaoss211:/brick8
>     Brick17: psanaoss212:/brick1
>     Brick18: psanaoss212:/brick2
>     Brick19: psanaoss212:/brick3
>     Brick20: psanaoss212:/brick4
>     Brick21: psanaoss212:/brick5
>     Brick22: psanaoss212:/brick6
>     Brick23: psanaoss212:/brick7
>     Brick24: psanaoss212:/brick8
>     Brick25: psanaoss213:/brick1
>     Brick26: psanaoss213:/brick2
>     Brick27: psanaoss213:/brick3
>     Brick28: psanaoss213:/brick4
>     Brick29: psanaoss213:/brick5
>     Brick30: psanaoss213:/brick6
>     Brick31: psanaoss213:/brick7
>     Brick32: psanaoss213:/brick8
>     Brick33: psanaoss215:/brick1
>     Brick34: psanaoss215:/brick2
>     Brick35: psanaoss215:/brick4
>     Brick36: psanaoss215:/brick5
>     Brick37: psanaoss215:/brick7
>     Brick38: psanaoss215:/brick8
>     Brick39: psanaoss215:/brick3
>     Brick40: psanaoss215:/brick6
>     Options Reconfigured:
>     performance.io-thread-count: 16
>     performance.write-behind-window-size: 16MB
>     performance.cache-size: 1GB
>     nfs.disable: on
>     performance.cache-refresh-timeout: 1
>     network.ping-timeout: 42
>     performance.cache-max-file-size: 1PB
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gluster.org/pipermail/gluster-users/attachments/20120608/9562d07f/attachment-0001.htm>