Re: very strange issue with 2.0.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-------- Original-Nachricht --------
> Datum: Fri, 05 Jun 2009 22:54:49 +0200
> Von: "Steve" <steeeeeveee@xxxxxxx>
> An: gluster-devel@xxxxxxxxxx
> Betreff: very strange issue with 2.0.1

> I have a very strange issue with 2.0.1. I have 2 systems. On each system
> there is server AND client running. The 2 servers are using serverside
> afr/replicate and the client on each is connected to a single brick/volume
> exported on his local server. The client does not know anything about the other
> server.
> 
> Now when I benchmark and look what get's over the wire, then I can not get
> more then +/- 14MB/s. No matter what performance translators I
> enable/disable. The speed is always around 14MB/s.
> 
> Now if I put NUFA on top of replicate, things change. I get faster
> transfer but the file written does not get transferred to the other server (I see
> NO network traffic in that regard). But if I go on the other server and do
> a simple "ls", then network traffic goes up to +/- 50MB/s and the file
> shows up on the other server/client.
> 
> That sound sounds to me normal (well... that's probably NUFA responsible
> in favoring the local disc).
> 
> However... after the "ls" command on the server/client where the file was
> transferred to, the other server/client crashes with the following log:
> +------------------------------------------------------------------------------+
> [2009-06-05 22:31:48] N [afr.c:2190:notify] gfs-srv-ds-replicate:
> Subvolume 'gfs-srv-ds-locks' came back up; going online.
> [2009-06-05 22:31:48] N [afr.c:2190:notify] gfs-srv-ds-replicate:
> Subvolume 'gfs-srv-ds-locks' came back up; going online.
> [2009-06-05 22:31:48] N [afr.c:2190:notify] gfs-srv-ds-replicate:
> Subvolume 'gfs-srv-ds-locks' came back up; going online.
> [2009-06-05 22:31:48] N [afr.c:2190:notify] gfs-srv-ds-replicate:
> Subvolume 'gfs-srv-ds-locks' came back up; going online.
> [2009-06-05 22:31:48] N [glusterfsd.c:1152:main] glusterfs: Successfully
> started
> [2009-06-05 22:31:48] N [client-protocol.c:5557:client_setvolume_cbk]
> gfs-srv-ds-remote: Connected to 192.168.0.77:6997, attached to remote volume
> 'gfs-srv-ds-locks'.
> [2009-06-05 22:31:48] N [client-protocol.c:5557:client_setvolume_cbk]
> gfs-srv-ds-remote: Connected to 192.168.0.77:6997, attached to remote volume
> 'gfs-srv-ds-locks'.
> [2009-06-05 22:31:51] N [server-protocol.c:7035:mop_setvolume]
> gfs-srv-ds-server: accepted client from 192.168.0.77:1021
> [2009-06-05 22:31:51] N [server-protocol.c:7035:mop_setvolume]
> gfs-srv-ds-server: accepted client from 127.0.0.1:1023
> [2009-06-05 22:31:51] N [server-protocol.c:7035:mop_setvolume]
> gfs-srv-ds-server: accepted client from 127.0.0.1:1022
> [2009-06-05 22:31:51] N [server-protocol.c:7035:mop_setvolume]
> gfs-srv-ds-server: accepted client from 192.168.0.77:1020
> pending frames:
> frame : type(1) op(LOOKUP)
> 
> patchset: 5c1d9108c1529a1155963cb1911f8870a674ab5b
> signal received: 11
> configuration details:argp 1
> backtrace 1
> dlfcn 1
> fdatasync 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 2.0.1
> [0xffffe400]
> /usr/lib/glusterfs/2.0.1/xlator/protocol/client.so(client_lookup+0x96)[0xb75573fb]
> /usr/lib/glusterfs/2.0.1/xlator/cluster/replicate.so(afr_lookup+0x22f)[0xb7517a05]
> /usr/lib/glusterfs/2.0.1/xlator/cluster/nufa.so(nufa_lookup+0x3ea)[0xb7503fa9]
> /usr/lib/glusterfs/2.0.1/xlator/performance/io-threads.so(iot_lookup_wrapper+0xa5)[0xb74e8415]
> /usr/lib/libglusterfs.so.0(call_resume+0x344)[0xb7f47e01]
> /usr/lib/glusterfs/2.0.1/xlator/performance/io-threads.so(iot_worker_unordered+0x20)[0xb74e5895]
> /lib/libpthread.so.0[0xb7f154cf]
> /lib/libc.so.6(clone+0x5e)[0xb7e9b27e]
> ---------
> 
> 
> Now my questions:
> 
> 1) Is this issue known? I can reproduce that error and therefore I could
> send more info if needed.
> 
> 2) Why is Server 1 AFR/Replicate <-> Server 2 AFR/Replicate so slow? Just
> 14MB/s on GigE seems slow to me. Writing directly to the local disk
> (without GlusterFS) delivers +/- 57MB/s. Going over NFSv4 delivers +/- 45MB/s.
> Going over SSH delivers +/- 33MB/s. Transfer from Server 1 to Server 2 with
> the GlusterFS log delivers +/- 45MB/s. Just pure Server to Server with
> AFR/Replicate only delivers 14MB/s. Why?
> 
Ach! I did more testing and the conclusion is:
Using 128KB chunks for writing changes the speed. Local disk write is then almost 90MB/s and GlusterFS is around 45MB/s. I guess I have no real speed issue. I would love to come close to 90MB/s but 45MB/s is fine.

However... the problem with the crash is still there if using NUFA. Maybe I just messed up and tried to many different (and obscure) combinations because NUFA never failed on me in the past?


> // Steve
> 
// Steve
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux