Urban, this bug has alredy been fixed in the source repository. thanks, avati 2007/5/10, Urban Loesch <ul@xxxxxxxx>:
Hi Avati, thanks for your fast answer. I use the version glusterfs-1.3.0-pre3 downloaded form your server (http://ftp.zresearch.com/pub/gluster/glusterfs/1.3-pre/). I will try the latest version from TLA today afternoon and let you know what happens. Here's the backtrace from the core dump # gdb glusterfsd -c core.15160 .. Core was generated by `glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG'. Program terminated with signal 11, Segmentation fault. #0 0xb75d8fd3 in posix_locks_flush () from /usr/lib/glusterfs/1.3.0-pre3/xlator/features/posix-locks.so (gdb) bt #0 0xb75d8fd3 in posix_locks_flush () from /usr/lib/glusterfs/1.3.0-pre3/xlator/features/posix-locks.so #1 0xb75d1192 in fop_flush () from /usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so #2 0xb75cded7 in proto_srv_notify () from /usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so #3 0xb7f54ecd in transport_notify (this=0x804b1a0, event=1) at transport.c:148 #4 0xb7f55b79 in sys_epoll_iteration (ctx=0xbfbc2ff0) at epoll.c:53 #5 0xb7f54f7d in poll_iteration (ctx=0xbfbc2ff0) at transport.c:251 #6 0x0804924e in main () Yes it is reproducible. It happens every time when I try to start my virtual server. Thanks Urban Anand Avati wrote: > Urban, > which version of glusterfs are you using? if it is from TLA checkout > what is the patchset number? > > you have a core dump generated from the segfault, can you please get a > backtrace from it? (gdb glusterfsd -c core.<pid> or gdb glusterfsd -c > core, type 'bt' command and paste the output) please. > > is this easily reproducible? have you checked with the latest TLA > checkout? > > thanks, > avati > > 2007/5/10, Urban Loesch <ul@xxxxxxxx>: >> Hi, >> >> I'm new to this list. >> First: sorry for my bad english. >> >> I was searching for some easy and transparent Clusterfilesystem with >> failover feature and I found on Wikipedia the GlusterFS project. >> It's a nice project and tried it on my test environment. I thought when >> it works good I use it in production too. >> >> A very nice feature for me is the AFR setup. So I can replicate all the >> data over 2 Servers in RAID-1 Mode. >> But it seems that I make something wrong, because the "glusterfsd" >> crashes on both nodes. >> But let me explain form the beginning. >> >> Here's my setup: >> Hardware: >> 2 different servers for storage >> 1 server as client >> On top of the server I use a virtual server setup (details >> http://linux-vserver.org). >> >> OS: >> Debian Sarge with self compiled 2.6.19.2 (uname -r 2.6.19.2-vs2.2.0) and >> latest stable virtual server patch. >> glusterfs-1.3.0-pre3.tar.gz >> >> What I'm trying to do: >> - Create a AFR Mirror over the 2 Servers. >> - Mount the Volume on Server 3 (Client). >> - Install on the mounted volume the hole virtual Server with Apache, >> MySql and so on. >> So I have a full redundant Virtual Server mirrored over two bricks . >> >> Here my current confuguration: >> - Serverconfig on Server 1 (brick) >> >> ### Export volume "brick" with the contents of "/home/export" directory. >> volume brick >> type storage/posix # POSIX FS translator >> option directory /gluster # Export this directory >> end-volume >> >> ### File Locking >> volume locks >> type features/posix-locks >> subvolumes brick >> end-volume >> >> ### Add network serving capability to above brick. >> volume server >> type protocol/server >> option transport-type tcp/server # For TCP/IP transport >> option listen-port 6996 # Default is 6996 >> subvolumes locks >> option auth.ip.locks.allow * # access to "brick" volume >> end-volume >> >> - Serverconfig on Server 2 (brick-afr) >> ### Export volume "brick" with the contents of "/home/export" directory. >> volume brick-afr >> type storage/posix # POSIX FS translator >> option directory /gluster-afr # Export this directory >> end-volume >> >> ### File Locking >> volume locks-afr >> type features/posix-locks >> subvolumes brick-afr >> end-volume >> >> ### Add network serving capability to above brick. >> volume server >> type protocol/server >> option transport-type tcp/server # For TCP/IP transport >> option listen-port 6996 # Default is 6996 >> subvolumes locks-afr >> option auth.ip.locks-afr.allow * # access to "brick" volume >> end-volume >> >> - Clientconfiguration on Server 3 ( >> ### Add client feature and attach to remote subvolume of server1 >> volume brick >> type protocol/client >> option transport-type tcp/client # for TCP/IP transport >> option remote-host 192.168.0.1 # IP address of the remote brick >> option remote-port 6996 # default server port is 6996 >> option remote-subvolume locks # name of the remote volume >> end-volume >> >> ### Add client feature and attach to remote subvolume of brick1 >> volume brick-afr >> type protocol/client >> option transport-type tcp/client # for TCP/IP transport >> option remote-host 192.168.0.2 # IP address of the remote brick >> option remote-port 6996 # default server port is 6996 >> option remote-subvolume locks-afr # name of the remote volume >> end-volume >> >> ### Add AFR feature to brick >> volume afr >> type cluster/afr >> subvolumes brick brick-afr >> option replicate *:2 # All files 2 copies (RAID-1) >> end-volume >> >> ---------------------------------------------------------------------------------------------------------------------- >> >> I started the two Bricks in debug mode and it starts without problems. >> >> - Server1 >> glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG >> .... >> [May 10 11:52:11] [DEBUG/proto-srv.c:2919/init()] >> protocol/server:protocol/server xlator loaded >> [May 10 11:52:11] [DEBUG/transport.c:83/transport_load()] >> libglusterfs/transport:attempt to load type tcp/server >> [May 10 11:52:11] [DEBUG/transport.c:88/transport_load()] >> libglusterfs/transport:attempt to load file >> /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so >> >> - Server2 >> glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG >> .... >> [May 10 11:51:44] [DEBUG/proto-srv.c:2919/init()] >> protocol/server:protocol/server xlator loaded >> [May 10 11:51:44] [DEBUG/transport.c:83/transport_load()] >> libglusterfs/transport:attempt to load type tcp/server >> [May 10 11:51:44] [DEBUG/transport.c:88/transport_load()] >> libglusterfs/transport:attempt to load file >> /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so >> ------------------------------------------------------------------------------------------------------------------------------ >> >> >> So far so good. >> >> After I mounted the volume on server 3 (client). It mounts without any >> problems. >> glusterfs --no-daemon --log-file=/dev/stdout --log-level=DEBUG >> --spec-file=/etc/glusterfs/glusterfs-client.vol >> /var/lib/vservers/mastersql >> ... >> [May 10 13:59:00] [DEBUG/client-protocol.c:2796/init()] >> protocol/client:defaulting transport-timeout to 120 >> [May 10 13:59:00] [DEBUG/transport.c:83/transport_load()] >> libglusterfs/transport:attempt to load type tcp/client >> [May 10 13:59:00] [DEBUG/transport.c:88/transport_load()] >> libglusterfs/transport:attempt to load file >> /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/client.so >> [May 10 13:59:00] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp: >> :try_connect: socket fd = 8 >> [May 10 13:59:00] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp: >> :try_connect: finalized on port `1022' >> [May 10 13:59:00] [DEBUG/tcp-client.c:255/tcp_connect()] >> tcp/client:connect on 8 in progress (non-blocking) >> [May 10 13:59:00] [DEBUG/tcp-client.c:293/tcp_connect()] >> tcp/client:connection on 8 still in progress - try later >> >> OK. Nice. >> A short check on the client: >> df -HT >> Filesystem Type Size Used Avail Use% Mounted on >> /dev/sda1 ext3 13G 2.6G 8.9G 23% / >> tmpfs tmpfs 1.1G 0 1.1G 0% /lib/init/rw >> udev tmpfs 11M 46k 11M 1% /dev >> tmpfs tmpfs 1.1G 0 1.1G 0% /dev/shm >> glusterfs:24914 >> fuse 9.9G 2.5G 6.9G 27% >> /var/lib/vservers/mastersql >> >> Wow it works. Now I can add, remove or edit files and directories >> without problems. The file are written to all two bricks without >> problems. Performance is good too. >> >> But then I tried to start my virtual Server (called mastersql). >> The virtual server not starts and I get the a lot of following debug >> output on the client: >> >> [May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp: >> :try_connect: socket fd = 4 >> [May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp: >> :try_connect: finalized on port `1023' >> [May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()] >> tcp/client:connect on 4 in progress (non-blocking) >> [May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()] >> tcp/client:connection on 4 still in progress - try later >> [May 10 14:04:43] [ERROR/client-protocol.c:204/client_protocol_xfer()] >> protocol/client:transport_submit failed >> [May 10 14:04:43] >> [DEBUG/client-protocol.c:2604/client_protocol_cleanup()] >> protocol/client:cleaning up state in transport object 0x8076cf0 >> [May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp: >> :try_connect: socket fd = 7 >> [May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp: >> :try_connect: finalized on port `1022' >> [May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()] >> tcp/client:connect on 7 in progress (non-blocking) >> [May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()] >> tcp/client:connection on 7 still in progress - try later >> [May 10 14:04:43] [ERROR/client-protocol.c:204/client_protocol_xfer()] >> protocol/client:transport_submit failed >> [May 10 14:04:43] >> [DEBUG/client-protocol.c:2604/client_protocol_cleanup()] >> protocol/client:cleaning up state in transport object 0x80762d0 >> >> The two mirrorservers are crashing with the following debug code: >> >> [May 10 11:54:26] [DEBUG/tcp-server.c:134/tcp_server_notify()] >> tcp/server:Registering socket (5) for new transport object of >> 192.168.0.3 >> [May 10 11:55:22] [DEBUG/proto-srv.c:2418/mop_setvolume()] >> server-protocol:mop_setvolume: received port = 1022 >> [May 10 11:55:22] [DEBUG/proto-srv.c:2434/mop_setvolume()] >> server-protocol:mop_setvolume: IP addr = *, received ip addr = >> 192.168.0.3 >> [May 10 11:55:22] [DEBUG/proto-srv.c:2444/mop_setvolume()] >> server-protocol:mop_setvolume: accepted client from 192.168.0.3 >> >> Trying to set: READ Is grantable: READ Inserting: READTrying to set: >> UNLOCK Is grantable: UNLOCK Conflict with: READTrying to set: WRITE >> Is grantable: WRITE Inserting: WRITETrying to set: UNLOCK Is >> grantable: UNLOCK Conflict with: WRITETrying to set: WRITE Is >> grantable: WRITE Inserting: WRITETrying to set: UNLOCK Is grantable: >> UNLOCK Conflict with: WRITETrying to set: WRITE Is grantable: WRITE >> Inserting: WRITE[May 10 12:00:09] >> [CRITICAL/common-utils.c:215/gf_print_trace()] debug-backtrace:Got >> signal (11), printing backtrace >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] >> debug-backtrace:/usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) >> [0xb7f53a7e] >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] >> debug-backtrace:[0xb7f60420] >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] >> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so >> [0xb75d1192] >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] >> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so >> [0xb75cded7] >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] >> debug-backtrace:/usr/lib/libglusterfs.so.0(transport_notify+0x1d) >> [0xb7f54ecd] >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] >> debug-backtrace:/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xe9) >> [0xb7f55b79] >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] >> debug-backtrace:/usr/lib/libglusterfs.so.0(poll_iteration+0x1d) >> [0xb7f54f7d] >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] >> debug-backtrace:glusterfsd [0x804924e] >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] >> debug-backtrace:/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8) >> [0xb7e17ea8] >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] >> debug-backtrace:glusterfsd [0x8048c51] >> Segmentation fault (core dumped) >> >> It seems that there are come conflicts with "READ, WRITE, UNLOCK". But >> I'm not an expert on filesystems an locking features. >> >> As you can see the filesystem is just mounted but not connected to the >> two bricks. >> df -HT >> Filesystem Type Size Used Avail Use% Mounted on >> /dev/sda1 ext3 13G 2.6G 8.9G 23% / >> tmpfs tmpfs 1.1G 0 1.1G 0% /lib/init/rw >> udev tmpfs 11M 46k 11M 1% /dev >> tmpfs tmpfs 1.1G 0 1.1G 0% /dev/shm >> df: `/var/lib/vservers/mastersql': Transport endpoint is not connected >> >> I'm not sure if i make something wrong (configuration) or if it is a >> bug! >> Can you experts please help me? >> >> If you need any further information or something please let me know. >> >> Thanks and regards >> Urban >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@xxxxxxxxxx >> http://lists.nongnu.org/mailman/listinfo/gluster-devel >> > >
-- Anand V. Avati