Re: Crashing glusterfs server / sync not working

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I upgraded to the latest tla (644) and added posix/locks to the server configs.

Self-heal works now and the server didn't crash now. It' still up and running. LS is a bit slow, but I think it's because the self-heal is working very hard in the background (I can see a lot of messages in the logs).


Guido Smit wrote:
Anand,

The client log was
2008-01-31 10:04:24 W [client-protocol.c:288:client_protocol_xfer] mailspool: attempting to pipeline request type(0) op(22) with handshake 2008-01-31 10:08:00 W [client-protocol.c:209:call_bail] mailspool: activating bail-out. pending frames = 2. last sent = 2008-01-31 10:04:24. last received = 1970-01-01 01:00:00 transport-timeout = 108 2008-01-31 10:08:00 C [client-protocol.c:217:call_bail] mailspool: bailing transport 2008-01-31 10:08:00 W [client-protocol.c:4503:client_protocol_cleanup] mailspool: cleaning up state in transport object 0x96a9930 2008-01-31 10:08:00 E [client-protocol.c:4555:client_protocol_cleanup] mailspool: forced unwinding frame type(0) op(34) reply=@0x9738e00 2008-01-31 10:08:00 E [fuse-bridge.c:431:fuse_entry_cbk] glusterfs-fuse: 18395: / => -1 (107) 2008-01-31 10:08:00 E [client-protocol.c:4555:client_protocol_cleanup] mailspool: forced unwinding frame type(0) op(22) reply=@0x9738e00 2008-01-31 10:08:00 E [fuse-bridge.c:670:fuse_fd_cbk] glusterfs-fuse: 18396: /blockbox.nl => -1 (107) 2008-01-31 10:08:00 C [tcp.c:81:tcp_disconnect] mailspool: connection disconnected

How do I get a backtrace from gdb?

Anand Avati wrote:
Guido,
can you get a backtrace from gdb of the core? also what was the client log at that time?

avati

2008/1/31, Guido Smit <guido@xxxxxxxxx <mailto:guido@xxxxxxxxx>>:

    Hi all,

    I have on my 2 Centos5 machines fuse2.7.2gls8 and glusterfs tla .
    Everything works fine, as long as I don't sync the machines.
    One of the things I see al the time is that most files are not
    synced on
    both servers. When I try to force a sync using find /mail -type f
    -exec
    head -c 1 {} \; >/dev/null

    I get the following crash after a few minutes:

    2008-01-31 10:08:00 E [server-protocol.c:178:generic_reply] server:
    transport_writev failed
    2008-01-31 10:08:00 D [inode.c:308:__destroy_inode] mail/inode:
    destroy
    inode(0) [@0xb7e23a60]

    ---------
    got signal (11), printing backtrace
    ---------
    [0x537420]
    //lib/libglusterfs.so.0[0xd4390c]
    //lib/libglusterfs.so.0[0xd4390c]
    //lib/libglusterfs.so.0[0xd4390c]
//lib/glusterfs/1.3.8/xlator/cluster/unify.so(unify_opendir_cbk+0xa3)[0x2e07f3] //lib/glusterfs/1.3.8/xlator/cluster/afr.so(afr_opendir_cbk+0x138)[0x91c9f8]
    //lib/glusterfs/1.3.8/xlator/protocol/client.so[0x1125b8]
//lib/glusterfs/1.3.8/xlator/protocol/client.so(notify+0xa97)[0x116717]
    //lib/libglusterfs.so.0(transport_notify+0x37)[0xd47aa7]
    //lib/libglusterfs.so.0(sys_epoll_iteration+0xd7)[0xd487e7]
    //lib/libglusterfs.so.0(poll_iteration+0x7c)[0xd47bdc]
    [glusterfsd][0x8049432]
    //lib/libc.so.6(__libc_start_main+0xdc)[0xbe0dec]
    [glusterfsd][0x8048cf1]
    ---------

    My glusterfs-server.vol:

    volume pop1-mail-ns
            type protocol/client
            option transport-type tcp/client
            option remote-host 62.59.252.41 <http://62.59.252.41>
            option remote-subvolume pop1-mail-ns
            option transport-timeout 10
    end-volume

    volume pop1-mail-ds
            type protocol/client
            option transport-type tcp/client
            option remote-host 62.59.252.41 <http://62.59.252.41>
            option remote-subvolume pop1-mail-ds
            option transport-timeout 10
    end-volume

    volume pop2-mail-ns
            type storage/posix
            option directory /home/export/namespace
    end-volume

    volume pop2-mail-ds
            type storage/posix
            option directory /home/export/mailspool
    end-volume

    volume ns-afr
            type cluster/afr
            subvolumes pop1-mail-ns pop2-mail-ns
            option scheduler random
    end-volume

    volume ds-afr
            type cluster/afr
            subvolumes pop1-mail-ds pop2-mail-ds
            option scheduler random
    end-volume

    volume mail-unify
            type cluster/unify
            subvolumes ds-afr
            option namespace ns-afr
            option scheduler alu
            option alu.limits.max-open-files 10000   # Don't create
    files on
    a volume with more than 10000 files open
            option alu.order
    disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
            option alu.disk-usage.entry-threshold 2GB   # Kick in if the
    discrepancy in disk-usage between volumes is more than 2GB
            option alu.disk-usage.exit-threshold  60MB   # Don't stop
    writing to the least-used volume until the discrepancy is 1988MB
            option alu.open-files-usage.entry-threshold 1024   # Kick
    in if
    the discrepancy in open files is 1024
            option alu.open-files-usage.exit-threshold 32   # Don't stop
    until 992 files have been written the least-used volume
            option alu.stat-refresh.interval 10sec   # Refresh the
    statistics used for decision-making every 10 seconds
    end-volume

    volume mail-iothreads
            type performance/io-threads
            option thread-count 8
            option cache-size 64MB
            subvolumes mail-unify
    end-volume

    volume mail-wb
            type performance/write-behind
            subvolumes mail-iothreads
    end-volume

    volume mail
            type performance/read-ahead
            subvolumes mail-wb
    end-volume

    volume server
            type protocol/server
            option transport-type tcp/server
            subvolumes mail
            option auth.ip.pop2-mail-ds.allow 62.59.252.*,127.0.0.1
    <http://127.0.0.1>
            option auth.ip.pop2-mail-ns.allow 62.59.252.*,127.0.0.1
    <http://127.0.0.1>
            option auth.ip.mail.allow 62.59.252.*,127.0.0.1
    <http://127.0.0.1>
    end-volume


    My glusterfs-client.vol:

    volume mailspool
            type protocol/client
            option transport-type tcp/client
            option remote-host 127.0.0.1 <http://127.0.0.1>
            option remote-subvolume mail
    end-volume

    volume writeback
            type performance/write-behind
            option aggregate-size 131072
            subvolumes mailspool
    end-volume

    volume readahead
            type performance/read-ahead
            option page-size 65536
            option page-count 16
            subvolumes writeback
    end-volume

    --
    Regards,

    Guido Smit
    DevInet




    _______________________________________________
    Gluster-devel mailing list
    Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx>
    http://lists.nongnu.org/mailman/listinfo/gluster-devel




--
If I traveled to the end of the rainbow
As Dame Fortune did intend,
Murphy would be there to tell me
The pot's at the other end.
------------------------------------------------------------------------

No virus found in this incoming message.
Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.17/1252 - Release Date: 1/30/2008 8:51 PM


--
Met vriendelijke groet,

Guido Smit
ComLog B.V.

Televisieweg 133
1322 BE Almere
T. 036 5470500
F. 036 5470481





[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux