[bug #20200] Segfault on file not found

Bruce <INVALID.NOREPLY@xxxxxxx> · Mon, 18 Jun 2007 07:08:32 +0000

URL:
  <http://savannah.nongnu.org/bugs/?20200>

                 Summary: Segfault on file not found
                 Project: Gluster
            Submitted by: hook
            Submitted on: Monday 06/18/2007 at 07:08
                Category: GlusterFS
                Severity: 3 - Normal
                Priority: 5 - Normal
              Item Group: Crash
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
        Operating System: GNU/Linux

    _______________________________________________________

Details:

When testing glusterFS (glusterfs-1.3.0-pre4) with some traffic from our
production servers the system crashed (and started returning 'Transport
endpoint not connected') this happens regaully but I cannot track down why.

The core file indicates the glusterfs client is seg faulting because a file
cannot be found.
-- 8<-- output from gdb glusterfs /core.10469:

Program terminated with signal 11, Segmentation fault.
#0  0xb760531f in ra_frame_return (frame=0x81886e0) at page.c:284
284     page.c: No such file or directory.
        in page.c
(gdb) bt
#0  0xb760531f in ra_frame_return (frame=0x81886e0) at page.c:284
#1  0xb7604586 in ra_readv (frame=0x81886e0, this=0x8076368,
file_ctx=0x8181520, size=8192, offset=0) at read-ahead.c:412
#2  0xb7fa8964 in default_readv (frame=0x8174c68, this=0x80768c0,
fd=0x8181520, size=8192, offset=0) at defaults.c:582
#3  0x0804c089 in fuse_readv (req=0x818a0c8, ino=1187, size=8192, off=0,
fi=0xbfb93b5c) at fuse-internals.c:1910
#4  0xb7f947e9 in fuse_reply_err () from /usr/lib/libfuse.so.2
#5  0xb7f95733 in fuse_reply_entry () from /usr/lib/libfuse.so.2
#6  0xb7f96f26 in fuse_session_process () from /usr/lib/libfuse.so.2
#7  0x0804a8c8 in fuse_transport_notify (xl=0x80541f0, trans=0x8054398,
event=<value optimized out>) at fuse-bridge.c:312
#8  0xb7faa9bd in transport_notify (this=0x8054398, event=1) at
transport.c:148
#9  0xb7fab569 in sys_epoll_iteration (ctx=0xbfb93cdc) at epoll.c:53
#10 0xb7faaa6d in poll_iteration (ctx=0xbfb93cdc) at transport.c:251
#11 0x0804a11b in main (argc=4, argv=0xbfb93db4) at glusterfs.c:326
-- 8<--

All nodes started with an empty directory, and most data was rsynced from an
existing email system into the mounted gluster directory.

The system uses a 3 bricks, with the following afr setup:
server1-brick => server1-mirror (on server2)
server2-brick => server2-mirror (on server3)
server3-brick => server3-mirror (on server1)

it doesn't matter what scheduler is used on the unify brick (although the bt
above was created using the nufa scheduler)

The main issue here is that the system went dead in this situation. As
GlusterFS will be used in Cluster environments where HA is required this is
not good.

    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/bugs/?20200>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/