Urban, which version of glusterfs are you using? if it is from TLA checkout what is the patchset number? you have a core dump generated from the segfault, can you please get a backtrace from it? (gdb glusterfsd -c core.<pid> or gdb glusterfsd -c core, type 'bt' command and paste the output) please. is this easily reproducible? have you checked with the latest TLA checkout? thanks, avati 2007/5/10, Urban Loesch <ul@xxxxxxxx>:
Hi, I'm new to this list. First: sorry for my bad english. I was searching for some easy and transparent Clusterfilesystem with failover feature and I found on Wikipedia the GlusterFS project. It's a nice project and tried it on my test environment. I thought when it works good I use it in production too. A very nice feature for me is the AFR setup. So I can replicate all the data over 2 Servers in RAID-1 Mode. But it seems that I make something wrong, because the "glusterfsd" crashes on both nodes. But let me explain form the beginning. Here's my setup: Hardware: 2 different servers for storage 1 server as client On top of the server I use a virtual server setup (details http://linux-vserver.org). OS: Debian Sarge with self compiled 2.6.19.2 (uname -r 2.6.19.2-vs2.2.0) and latest stable virtual server patch. glusterfs-1.3.0-pre3.tar.gz What I'm trying to do: - Create a AFR Mirror over the 2 Servers. - Mount the Volume on Server 3 (Client). - Install on the mounted volume the hole virtual Server with Apache, MySql and so on. So I have a full redundant Virtual Server mirrored over two bricks . Here my current confuguration: - Serverconfig on Server 1 (brick) ### Export volume "brick" with the contents of "/home/export" directory. volume brick type storage/posix # POSIX FS translator option directory /gluster # Export this directory end-volume ### File Locking volume locks type features/posix-locks subvolumes brick end-volume ### Add network serving capability to above brick. volume server type protocol/server option transport-type tcp/server # For TCP/IP transport option listen-port 6996 # Default is 6996 subvolumes locks option auth.ip.locks.allow * # access to "brick" volume end-volume - Serverconfig on Server 2 (brick-afr) ### Export volume "brick" with the contents of "/home/export" directory. volume brick-afr type storage/posix # POSIX FS translator option directory /gluster-afr # Export this directory end-volume ### File Locking volume locks-afr type features/posix-locks subvolumes brick-afr end-volume ### Add network serving capability to above brick. volume server type protocol/server option transport-type tcp/server # For TCP/IP transport option listen-port 6996 # Default is 6996 subvolumes locks-afr option auth.ip.locks-afr.allow * # access to "brick" volume end-volume - Clientconfiguration on Server 3 ( ### Add client feature and attach to remote subvolume of server1 volume brick type protocol/client option transport-type tcp/client # for TCP/IP transport option remote-host 192.168.0.1 # IP address of the remote brick option remote-port 6996 # default server port is 6996 option remote-subvolume locks # name of the remote volume end-volume ### Add client feature and attach to remote subvolume of brick1 volume brick-afr type protocol/client option transport-type tcp/client # for TCP/IP transport option remote-host 192.168.0.2 # IP address of the remote brick option remote-port 6996 # default server port is 6996 option remote-subvolume locks-afr # name of the remote volume end-volume ### Add AFR feature to brick volume afr type cluster/afr subvolumes brick brick-afr option replicate *:2 # All files 2 copies (RAID-1) end-volume ---------------------------------------------------------------------------------------------------------------------- I started the two Bricks in debug mode and it starts without problems. - Server1 glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG .... [May 10 11:52:11] [DEBUG/proto-srv.c:2919/init()] protocol/server:protocol/server xlator loaded [May 10 11:52:11] [DEBUG/transport.c:83/transport_load()] libglusterfs/transport:attempt to load type tcp/server [May 10 11:52:11] [DEBUG/transport.c:88/transport_load()] libglusterfs/transport:attempt to load file /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so - Server2 glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG .... [May 10 11:51:44] [DEBUG/proto-srv.c:2919/init()] protocol/server:protocol/server xlator loaded [May 10 11:51:44] [DEBUG/transport.c:83/transport_load()] libglusterfs/transport:attempt to load type tcp/server [May 10 11:51:44] [DEBUG/transport.c:88/transport_load()] libglusterfs/transport:attempt to load file /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so ------------------------------------------------------------------------------------------------------------------------------ So far so good. After I mounted the volume on server 3 (client). It mounts without any problems. glusterfs --no-daemon --log-file=/dev/stdout --log-level=DEBUG --spec-file=/etc/glusterfs/glusterfs-client.vol /var/lib/vservers/mastersql ... [May 10 13:59:00] [DEBUG/client-protocol.c:2796/init()] protocol/client:defaulting transport-timeout to 120 [May 10 13:59:00] [DEBUG/transport.c:83/transport_load()] libglusterfs/transport:attempt to load type tcp/client [May 10 13:59:00] [DEBUG/transport.c:88/transport_load()] libglusterfs/transport:attempt to load file /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/client.so [May 10 13:59:00] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp: :try_connect: socket fd = 8 [May 10 13:59:00] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp: :try_connect: finalized on port `1022' [May 10 13:59:00] [DEBUG/tcp-client.c:255/tcp_connect()] tcp/client:connect on 8 in progress (non-blocking) [May 10 13:59:00] [DEBUG/tcp-client.c:293/tcp_connect()] tcp/client:connection on 8 still in progress - try later OK. Nice. A short check on the client: df -HT Filesystem Type Size Used Avail Use% Mounted on /dev/sda1 ext3 13G 2.6G 8.9G 23% / tmpfs tmpfs 1.1G 0 1.1G 0% /lib/init/rw udev tmpfs 11M 46k 11M 1% /dev tmpfs tmpfs 1.1G 0 1.1G 0% /dev/shm glusterfs:24914 fuse 9.9G 2.5G 6.9G 27% /var/lib/vservers/mastersql Wow it works. Now I can add, remove or edit files and directories without problems. The file are written to all two bricks without problems. Performance is good too. But then I tried to start my virtual Server (called mastersql). The virtual server not starts and I get the a lot of following debug output on the client: [May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp: :try_connect: socket fd = 4 [May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp: :try_connect: finalized on port `1023' [May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()] tcp/client:connect on 4 in progress (non-blocking) [May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()] tcp/client:connection on 4 still in progress - try later [May 10 14:04:43] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed [May 10 14:04:43] [DEBUG/client-protocol.c:2604/client_protocol_cleanup()] protocol/client:cleaning up state in transport object 0x8076cf0 [May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp: :try_connect: socket fd = 7 [May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp: :try_connect: finalized on port `1022' [May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()] tcp/client:connect on 7 in progress (non-blocking) [May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()] tcp/client:connection on 7 still in progress - try later [May 10 14:04:43] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed [May 10 14:04:43] [DEBUG/client-protocol.c:2604/client_protocol_cleanup()] protocol/client:cleaning up state in transport object 0x80762d0 The two mirrorservers are crashing with the following debug code: [May 10 11:54:26] [DEBUG/tcp-server.c:134/tcp_server_notify()] tcp/server:Registering socket (5) for new transport object of 192.168.0.3 [May 10 11:55:22] [DEBUG/proto-srv.c:2418/mop_setvolume()] server-protocol:mop_setvolume: received port = 1022 [May 10 11:55:22] [DEBUG/proto-srv.c:2434/mop_setvolume()] server-protocol:mop_setvolume: IP addr = *, received ip addr = 192.168.0.3 [May 10 11:55:22] [DEBUG/proto-srv.c:2444/mop_setvolume()] server-protocol:mop_setvolume: accepted client from 192.168.0.3 Trying to set: READ Is grantable: READ Inserting: READTrying to set: UNLOCK Is grantable: UNLOCK Conflict with: READTrying to set: WRITE Is grantable: WRITE Inserting: WRITETrying to set: UNLOCK Is grantable: UNLOCK Conflict with: WRITETrying to set: WRITE Is grantable: WRITE Inserting: WRITETrying to set: UNLOCK Is grantable: UNLOCK Conflict with: WRITETrying to set: WRITE Is grantable: WRITE Inserting: WRITE[May 10 12:00:09] [CRITICAL/common-utils.c:215/gf_print_trace()] debug-backtrace:Got signal (11), printing backtrace [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) [0xb7f53a7e] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:[0xb7f60420] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so [0xb75d1192] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so [0xb75cded7] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/libglusterfs.so.0(transport_notify+0x1d) [0xb7f54ecd] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xe9) [0xb7f55b79] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/libglusterfs.so.0(poll_iteration+0x1d) [0xb7f54f7d] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:glusterfsd [0x804924e] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8) [0xb7e17ea8] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:glusterfsd [0x8048c51] Segmentation fault (core dumped) It seems that there are come conflicts with "READ, WRITE, UNLOCK". But I'm not an expert on filesystems an locking features. As you can see the filesystem is just mounted but not connected to the two bricks. df -HT Filesystem Type Size Used Avail Use% Mounted on /dev/sda1 ext3 13G 2.6G 8.9G 23% / tmpfs tmpfs 1.1G 0 1.1G 0% /lib/init/rw udev tmpfs 11M 46k 11M 1% /dev tmpfs tmpfs 1.1G 0 1.1G 0% /dev/shm df: `/var/lib/vservers/mastersql': Transport endpoint is not connected I'm not sure if i make something wrong (configuration) or if it is a bug! Can you experts please help me? If you need any further information or something please let me know. Thanks and regards Urban _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel
-- Anand V. Avati