Re: Whole failure when one glusterfsd brought down.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Krishna,

Works as expected. Much appreciated!

I thought this was asked earlier, but cant find the question:

One of my 3 servers die and its down for a period. During that down period other processes recreate files that are still on the downed server. When I bring up the downed server and it becomes part of the filesystem again, how will glusterfs know which file to use now?

Should I ensure that the downed server does not contain files that are on other gluster servers before adding it back into the filesystem? I understand that before I bring the downed server back into glusterfs that the directories should be created on it.

Thanks,
Dale

Krishna Srinivas wrote:
Hi Dale,

This is a recent change. You need to put following line in unify
for it to work when one of the node goes down.
"option readdir-force-success on"

We discussed whether user should know if one of the node
has gone down, coming to the conclusion that it is best to
leave it as an option for the user to configure it himself.

Regards
Krishna


Krishna Srinivas wrote:
Hi Dale,

This is a recent change. You need to put following line in unify
for it to work when one of the node goes down.
"option readdir-force-success on"

We discussed whether user should know if one of the node
has gone down, coming to the conclusion that it is best to
leave it as an option for the user to configure it himself.

Regards
Krishna

On 3/22/07, Dale Dude <dale@xxxxxxxxxxxxxxx> wrote:
Using code pulled March 21st @ 1am EST. Kernel 2.6.20-10 on Ubuntu
Feisty 32bit.

I have 3 machines serving with glusterfsd and mounted the cluster from
the first server. If I kill one of the glusterfsd's on any of the
machines, the mount point becomes broken with the 'Transport' error
below. Also, glusterfs will produce this error even if I unmount and
remount with the 1 glusterfsd server still down. Is this expected
results or shouldn't the mount continue to work even though one of the
servers has "died"?

ls: reading directory local/: Transport endpoint is not connected

glusterfs.log produces this:
[Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
protocol/client: client_protocol_xfer: :transport_submit failed
[Mar 21 06:25:17] [ERROR/tcp-client.c:284/tcp_connect()]
tcp/client:non-blocking connect() returned: 111 (Connection refused)
[Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
protocol/client: client_protocol_xfer: :transport_submit failed
[Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
protocol/client: client_protocol_xfer: :transport_submit failed
[Mar 21 06:25:17] [ERROR/tcp-client.c:284/tcp_connect()]
tcp/client:non-blocking connect() returned: 111 (Connection refused)
[Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
protocol/client: client_protocol_xfer: :transport_submit failed
[Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
protocol/client: client_protocol_xfer: :transport_submit failed


======================================
glusterfs-server.vol used by all the servers:
(ignore my bad volume naming, was just testing)

volume brick
  type storage/posix                   # POSIX FS translator
  option directory /home/export        # Export this directory
end-volume

volume iothreads
   type performance/io-threads
   option thread-count 8
   subvolumes brick
end-volume

volume server
  type protocol/server
  option transport-type tcp/server     # For TCP/IP transport
  subvolumes iothreads
  option auth.ip.brick.allow * # Allow access to "brick" volume
end-volume


======================================
glusterfs-client.vol used on server1:
(ignore my bad volume naming, was just testing)

volume client1
  type protocol/client
  option transport-type tcp/client     # for TCP/IP transport
  option remote-host 1.1.1.1     # IP address of the remote brick
  option remote-subvolume brick        # name of the remote volume
end-volume

volume client2
  type protocol/client
  option transport-type tcp/client     # for TCP/IP transport
  option remote-host 2.2.2.2     # IP address of the remote brick
  option remote-subvolume brick        # name of the remote volume
end-volume

volume client3
  type protocol/client
  option transport-type tcp/client     # for TCP/IP transport
  option remote-host 3.3.3.3     # IP address of the remote brick
  option remote-subvolume brick        # name of the remote volume
end-volume

volume bricks
  type cluster/unify
  subvolumes client1 client2 client3
  option scheduler alu
  option alu.limits.min-free-disk  60GB              # Stop creating
files when free-space lt 60GB
  option alu.limits.max-open-files 10000
  option alu.order
disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
  option alu.disk-usage.entry-threshold 2GB          # Units in KB, MB
and GB are allowed
  option alu.disk-usage.exit-threshold  60MB         # Units in KB, MB
and GB are allowed
  option alu.open-files-usage.entry-threshold 1024
  option alu.open-files-usage.exit-threshold 32
  option alu.stat-refresh.interval 10sec
end-volume

volume writebehind   #writebehind improves write performance a lot
  type performance/write-behind
  option aggregate-size 131072 # in bytes
  subvolumes bricks
end-volume


_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel






[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux