Hans K. Rosbach wrote:
On Wed, 2011-06-08 at 12:34 +0100, Gordan Bobic wrote:
Hans K. Rosbach wrote:
-SCTP support, this might not be a silver bullet but it feels
[...]
Features that might need glusterfs code changes:
[...]
-Multihoming (failover when one nic dies)
How is this different to what can be achieved (probably much more
cleanly) with NIC bonding?
NIC bonding is nice for a small network, but routed networks might
have advantages from this. This is not something I feel that I need,
but I am sure it would be an advantage for some other users. This could
possibly be of help in geo-replication setups for example.
Not sure what routedness has to do with this. If you need route failover
this is probably something best done by having a HA/cluster service
change the routing table accordingly.
-Ability to have the storage nodes autosync themselves.
In our setup the normal nodes have 2x1Gbit connections while the
storage boxes have 2x10Gbit connections, so having the storage
boxes use their own bandwidth and resources to sync would be nice.
Sounds like you want server-side rather than client-side replication.
You could do this by using afr/replicate on the servers, and export via
NFS to the clients. Have failover handled as for any normal NFS server.
We have considered this, and might decide to go down this route
eventually, however it seems strange that this can not also be done
using the native client.
Is the current NFS wheel not quite round enough for you? ;)
The fact that each client writes to both servers is fine, but the
fact that the clients needs to do the re-sync work whenever the
storage nodes are out of sync (one of them rebooted for example)
seems strange and feels very unreliable especially since this is
a manual operation.
There is a plan C, though. You can make the servers also clients. You
can then have a process that does "ls -laR" periodically or upon failure.
-An ability for the clients to subscribe to metadata updates for
a specific directory would also be nice, so that it can cache that
folders stats while working there and still know that it will not
miss any changes. This would perhaps increase overhead in large
clusters but could improve performance by a lot in clusters where
several nodes work in the same folder (mail spool folder for example).
You have a shared mail spool on your nodes? How do you avoid race
conditions on deferred mail?
Several nodes can deliver mails to the spool folder, and dedicated queue
runners will pick them up and deliver them to local and/or remote hosts.
I am not certain what race conditions you are referring to, but locking
should make sure no more than one queue runner touches the file at one
time. Am I missing something?
Are you sure your MTA applies locks suitably? I wouldn't bet on it. I
would expect that most of them assume unshared spools. Also remember
that locking is a _major_ performance bottleneck when it comes to
cluster file systems. Multiple nodes doing locking and r/w in the same
directory will have an inverse scaling impact on performance, especially
on small I/O such as you are likely to experience on a mail spool.
If there is no file locking you will likely see non-deterministic
multiple sending of mail, especially deferred mail. Depending on how
your MTA produces mail spool file names, you may see non-deterministic
silent clobbering, too, if it doesn't do parent directory locking on
file creation/deletion.
If there is locking, you will likely see that the performance starts to
reduce as you add more servers due to lock contention.
Gordan