glusterfs-1.3.8pre1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

*Q1: *

I've installed glusterfs-1.3.8pre1 on one node of a cluster running 1.3.7, but glusterfsd 1.3.8 drops incoming connections from 1.3.7 clients. Is this by design ? Do I need to upgrade everything to 1.3.8pre1 at once ?

Here's a part of /var/log/glusterfs/glusterfsd.log

2008-02-22 17:17:33 C [tcp.c:87:tcp_disconnect] server: connection disconnected 2008-02-22 17:17:33 E [server-protocol.c:183:generic_reply] server: transport_writev failed 2008-02-22 17:17:33 C [tcp.c:87:tcp_disconnect] server: connection disconnected 2008-02-22 17:17:33 E [server-protocol.c:183:generic_reply] server: transport_writev failed 2008-02-22 17:17:33 C [tcp.c:87:tcp_disconnect] server: connection disconnected 2008-02-22 17:17:33 C [tcp.c:87:tcp_disconnect] server: connection disconnected


*Q2: *

The reasons for trying to upgrade to 1.3.8 are the following:

Current configuration:

- 16 clients / 16 servers (one client/server on each machine)
- servers are dual opteron, some of them quad core, 8 or 12 gb ram
- kernel 2.6.24-2, linux gentoo (can provide gluster ebuilds)
- fuse 2.7.2glfs8, glusterfs 1.3.7 - see config files- basicly a simple unify with no ra/wt cache

Configs are here: http://gluster.pastebin.com/m7f61927f
All servers are stable and the problems below are in normal running conditions.

Inside the gluster filesystem we store ~3 million pictures, in a directory tree that guarantees up to 1k pictures or subdirectories per directory, with ~30 writes per second, and ~300 reads per
second. Files are relatively small, 4-5k/picture.


1. glusterfs (client) appears to memory leak in our configuration - 300 mb RAM eaten over 2 days.

2. frequent files with size 0, ctime 0 (1970) even if all servers are up and running.

3. occasional files with correct size/ctime that cannot be read, and sometimes they can be
read from other servers.

4. back when I was using AFR for mirrored namespace (which I gave up, trying to alleviate the other errors), crash in AFR in glusterfs (client) when one of the servers was shutting down.

These errors appear in glusterfs.log when a file cannot be read:

2008-02-22 17:37:36 E [unify.c:837:unify_open] nowb-nora-client-stable-www: /tmpfs/small/1/70/92/7092182.jpg: entry_count is 4 2008-02-22 17:34:51 E [unify.c:790:unify_open_cbk] nowb-nora-client-stable-www: Open success on namespace, failed on child node


*Q3:*

Nice-to-haves:

1. Redundant namespace => no single point of failure.

2. A way to see a diagram of the cluster, it's connected nodes, etc. for anyone running more than 2-3 servers - pulling live data from one of the servers/clients.


As a side question, our organization could commit a part time developer dedicated to helping out with glusterfs; are you interested ?

Best regards
Dan


[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux