glusterfs 3.1.1 rdma module crashing when mounting volume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

We have a small HPC cluster and I tried to harness the spare disk space
of our compute nodes to take of some load from the cluster's nfs server.

I started of using glusterfs 3.0.x packaged with Debian Squeeze (all
nodes use this version)

I tried updating to glusterfs 3.1.x using the prepackaged files [1]
from gluster.org, but found out I was no longer able to use the
Infiniband interconnect, because the packages seem to be compiled
without rdma support.

To get the faster interconnect back I repackaged glusterfs 3.1.1 from
the source tarball and installed it on all nodes. However rdma crashes
when mounting a volume on the head node [2], it works fine from the
compute nodes. The only significant in respect to infiniband is, that
the head node uses another nic:

Work Nodes:
02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0
5GT/s - IB QDR / 10GigE] (rev b0)

Head Node:
06:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx
HCA] (rev 20)


If anyone has an idea how to get this working, please let me know.

Regards,

J?rg Blank


[1] http://download.gluster.com/pub/gluster/glusterfs/3.1/LATEST/Debian/

[2] Backtrace from logs:

[2010-12-24 22:45:11.516902] W [io-stats.c:1644:init] test-volume:
dangling volume. check volfile
[2010-12-24 22:45:11.516943] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-12-24 22:45:11.516955] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-12-24 22:45:11.527333] E [rdma.c:2066:rdma_create_cq]
rpc-transport/rdma: test-volume-client-1: creation of send_cq failed
[2010-12-24 22:45:11.527529] E [rdma.c:3771:rdma_get_device]
rpc-transport/rdma: test-volume-client-1: could not create CQ
[2010-12-24 22:45:11.527541] E [rdma.c:3957:rdma_init]
rpc-transport/rdma: could not create rdma device for mthca0
[2010-12-24 22:45:11.527611] E [rdma.c:4789:init] test-volume-client-1:
Failed to initialize IB Device
[2010-12-24 22:45:11.527623] E [rpc-transport.c:971:rpc_transport_load]
rpc-transport: 'rdma' initialization failed
pending frames:

patchset: v3.1.1
signal received: 11
time of crash: 2010-12-24 22:45:11
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.1.1
/lib/libc.so.6(+0x321e0)[0x7f8e2c3ea1e0]
/lib/libc.so.6(+0x7a126)[0x7f8e2c432126]
/usr/lib/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x37c)[0x7f8e28956e7c]
/usr/lib/libgfrpc.so.0(rpc_transport_load+0x365)[0x7f8e2cd5a035]
/usr/lib/libgfrpc.so.0(rpc_clnt_new+0xf9)[0x7f8e2cd5de59]
/usr/lib/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa9)[0x7f8e29c32b09]
/usr/lib/glusterfs/3.1.1/xlator/protocol/client.so(init+0xf1)[0x7f8e29c32cb1]
/usr/lib/libglusterfs.so.0(xlator_init+0x58)[0x7f8e2cf7c978]
/usr/lib/libglusterfs.so.0(glusterfs_graph_init+0x35)[0x7f8e2cfa5b05]
/usr/lib/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x7f8e2cfa5c48]
/usr/sbin/glusterfs(glusterfs_process_volfp+0xba)[0x40447a]
/usr/sbin/glusterfs(mgmt_getspec_cbk+0xc7)[0x405cc7]
/usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7f8e2cd5cb75]
/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xc9)[0x7f8e2cd5cdc9]
/usr/lib/libgfrpc.so.0(rpc_transport_notify+0x2d)[0x7f8e2cd57d7d]
/usr/lib/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7f8e2a870c94]
/usr/lib/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0xb3)[0x7f8e2a870d63]
/usr/lib/libglusterfs.so.0(+0x3a272)[0x7f8e2cf9d272]
/usr/sbin/glusterfs(main+0x247)[0x4054c7]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f8e2c3d6c4d]
/usr/sbin/glusterfs[0x403179]


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux