setup issues and errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello All,

We have recently switched to gluster from nfs for sharing images
between a cluster of web servers. I have noticed a few issues, and am
hoping someone has some advice.
One of the main reasons to switch was redundancy - if one server goes
down the clients continue to write and read images, when the server
comes back, it gets synced. When we would have trouble with nfs, the
whole site was crippled/down, we are trying to get away from this.

We launched gluster into production a couple of weeks ago. We have
seen a few issues since then.

1) when one of the servers was under load (from something running on
the same box), we started getting a lot of timeout errors from the
app. I turned off the gluster server on that host and things were ok
again

2) we rebooted one of the clients, gluster was not started on reboot,
so we made some config changes and rebooted to see that it came up, it
did not (our isssue, I know) so, we started it manually...we started
getting a lot of timeout errors from our app...and then we started
getting a lot from all the other clients too, I ended up killing
gluster and remounting all the clients and things seem to be ok
now...sorry to be so vague, I just don't have a lot of data yet...

3) A client box became totally unresponsive and had to be power
cycled, we suspect it was gluster related as it had a really high load
not too long after the event above

from the logs on one of the servers, this is a snip, it looks mostly like this

2009-02-25 17:48:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:44380)
2009-02-25 17:48:01 W [posix.c:959:posix_create] brick-ns: open on
/images/b/bd/Logo-southerncrosshumanitarian-org.jpg: No such file or
directory
2009-02-25 17:49:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:44383)
2009-02-25 17:49:58 W [posix.c:959:posix_create] brick-ns: open on
/images/a/a1/Logo-replica-designers-com.gif: No such file or directory
2009-02-25 17:50:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:44388)
2009-02-25 17:51:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:44391)
2009-02-25 17:51:47 W [posix.c:959:posix_create] brick-ns: open on
/images/9/9b/Logo-brisbanetraybodys-com-au.png: No such file or
directory
2009-02-25 17:52:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:44396)
2009-02-25 17:52:29 W [posix.c:959:posix_create] brick-ns: open on
/images/e/e5/Logo-callverse-com.gif: No such file or directory
2009-02-25 17:53:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:53321)
2009-02-25 17:54:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:53326)
2009-02-25 17:55:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:53336)
2009-02-25 17:55:36 W [posix.c:959:posix_create] brick-ns: open on
/images/6/6f/jigsaw-logo.png: No such file or directory
2009-02-25 17:56:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:53337)
2009-02-25 17:57:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:53342)
2009-02-25 17:58:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:54064)
2009-02-25 17:59:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:54067)
2009-02-25 18:00:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:54072)
2009-02-25 18:01:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:54079)
2009-02-25 18:01:30 W [posix.c:959:posix_create] brick-ns: open on
/images/8/86/Portrait-KARACTERE.jpg: No such file or directory
2009-02-25 18:01:34 W [posix.c:959:posix_create] brick-ns: open on
/images/e/e4/Cropped-Portrait-KARACTERE.jpg: No such file or directory
2009-02-25 18:02:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:54084)
2009-02-25 18:03:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:43887)

from one of the clients
2009-02-25 17:27:12 E [client-protocol.c:4430:client_lookup_cbk]
brick-ns1: no proper reply from server, returning ENOTCONN
2009-02-25 17:27:12 E [client-protocol.c:325:client_protocol_xfer]
brick-ns1: transport_submit failed
2009-02-25 17:29:13 C [client-protocol.c:212:call_bail] brick-ns1:
bailing transport
2009-02-25 17:29:13 E [client-protocol.c:4834:client_protocol_cleanup]
brick-ns1: forced unwinding frame type(2) op(6) reply=@0x2aaab4467d80
2009-02-25 17:29:13 E [client-protocol.c:4277:client_unlock_cbk]
brick-ns1: no proper reply from server, returning ENOTCONN
2009-02-25 17:29:13 E [client-protocol.c:325:client_protocol_xfer]
brick-ns1: transport_submit failed
2009-02-25 17:34:28 E [afr.c:4625:afr_create_cbk] afr-ns:
(path=/images/8/8a/Portrait-Dan_Korn.jpg child=brick-ns2) op_ret=-1
op_errno=2
2009-02-25 17:34:28 E [afr.c:4625:afr_create_cbk] afr-ns:
(path=/images/6/65/Cropped-Portrait-Dan_Korn.jpg child=brick-ns2)
op_ret=-1 op_errno=2
2009-02-25 17:36:35 C [client-protocol.c:212:call_bail] brick-ns1:
bailing transport
2009-02-25 17:36:35 E [client-protocol.c:4834:client_protocol_cleanup]
brick-ns1: forced unwinding frame type(1) op(40) reply=@0x2aaab408f390
2009-02-25 17:36:35 E [client-protocol.c:4613:client_checksum_cbk]
brick-ns1: no proper reply from server, returning ENOTCONN
2009-02-25 17:56:53 C [client-protocol.c:212:call_bail] brick-ns1:
bailing transport
2009-02-25 17:56:53 E [client-protocol.c:4834:client_protocol_cleanup]
brick-ns1: forced unwinding frame type(2) op(5) reply=@0x2aaab46410a0
2009-02-25 17:56:53 E [client-protocol.c:4246:client_lock_cbk]
brick-ns1: no proper reply from server, returning ENOTCONN
2009-02-25 17:56:53 E [client-protocol.c:325:client_protocol_xfer]
brick-ns1: transport_submit failed
2009-02-25 17:58:34 C [client-protocol.c:212:call_bail] brick-ns1:
bailing transport
2009-02-25 17:58:34 E [client-protocol.c:4834:client_protocol_cleanup]
brick-ns1: forced unwinding frame type(1) op(34) reply=@0x2aaab41634e0
2009-02-25 17:58:34 E [client-protocol.c:4430:client_lookup_cbk]
brick-ns1: no proper reply from server, returning ENOTCONN
2009-02-25 17:58:34 E [client-protocol.c:325:client_protocol_xfer]
brick-ns1: transport_submit failed
2009-02-25 17:59:24 C [client-protocol.c:212:call_bail] brick-ns1:
bailing transport
2009-02-25 17:59:24 E [client-protocol.c:4834:client_protocol_cleanup]
brick-ns1: forced unwinding frame type(2) op(6) reply=@0x2aaab407b890
2009-02-25 17:59:24 E [client-protocol.c:4277:client_unlock_cbk]
brick-ns1: no proper reply from server, returning ENOTCONN

______________
server conf
_____________

volume brick
 type storage/posix
 option directory /opt/glusterfs/share/
end-volume

volume brick-ns
 type storage/posix
 option directory /opt/glusterfs/share-ns/
end-volume

volume server
 type protocol/server
 option transport-type tcp/server
 option client-volume-filename /etc/glusterfs/glusterfs-client.vol
 subvolumes brick brick-ns
 option auth.ip.brick.allow 10.*    # Allow access to "brick" volume
 option auth.ip.brick-ns.allow 10.* # Allow access to "brick-ns" volume
end-volume

# performance changes
  volume locks
    type features/posix-locks
    option mandatory-locks on
    subvolumes brick
  end-volume

  volume iothreads
    type performance/io-threads
    option thread-count 8
    subvolumes locks
  end-volume


___________
client config
__________

volume brick1
 type protocol/client
 option transport-type tcp/client
 option remote-host cumulus.adm      # IP address of the remote brick
 option remote-subvolume brick        # name of the remote volume
end-volume

volume brick2
 type protocol/client
 option transport-type tcp/client
 option remote-host dbs3.adm
 option remote-subvolume brick
end-volume

volume brick-ns1
 type protocol/client
 option transport-type tcp/client
 option remote-host cumulus.adm
 option remote-subvolume brick-ns  # Note the different remote volume name.
end-volume

volume brick-ns2
 type protocol/client
 option transport-type tcp/client
 option remote-host dbs3.adm
 option remote-subvolume brick-ns  # Note the different remote volume name.
end-volume

volume afr1
 type cluster/afr
 subvolumes brick1 brick2
end-volume

volume afr-ns
 type cluster/afr
 subvolumes brick-ns1 brick-ns2
end-volume

olume unify
 type cluster/unify
 option namespace afr-ns
 option scheduler rr
 subvolumes afr1
end-volume


# performance changes

volume writebehind
  type performance/write-behind
  option aggregate-size 128KB
  option window-size 1MB
  subvolumes unify
end-volume

volume cache
  type performance/io-cache
  option cache-size 512MB
  subvolumes writebehind
end-volume


Any advice is greatly appreciated



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux