That makes all the sense in the world to me to have replication on the server side. I especially like the idea about network failover and not having to depend on client mounts to maintain consistency on the server side. Majied On Tue, 1 May 2007 09:05:28 -0700 Anand Avati <avati@xxxxxxxxxxxxx> wrote: > > here is a design proposal about some changes to afr and related. > currently AFR is totally handled on the client side, where the client > does the replication as well as failover. the AFR translator > essentially is doing _two_ features - 1. replication 2. failover. > > In view of the recent race condition discussed about AFR in the mailing > list (two clients writing to the same region running into a race while > writing to second mirror) and for other benefits mentioned below, the > proposal is to split replication and failover into two seperate > translators. replication is meant to be loaded on the server side > while failover alone is meant to be loaded on the client side. > > imagine grouping your storage cluster into pairs or triplets or > quadriplets. the AFR translator will be loaded to form these groups, > but on the server side. each memeber of the (say) triplet will load > AFR with one child as the storage/posix and the other two children as > protocol/clients for the auxillary export of the remaining two > servers. thus the effect is, > > * when you write to one server, it goes to all the three (redundancy) > * and, you can write via any server (used for failover) > > under normal situation, the failover at client uses 'primary child' > (the non-auxillary export server) and opeartions are performed only on > that child. the server side takes care of replication. when the server > goes down failover detects broken link and uses the aux export. > > advantages: > > 1. since a file is replicated by a signle agent, no potential race > conditions (most important) > > 2. the failover abstraction works for nonAFR scenarios also. you can > use the failover translator to failover between two network links to > the same server. (generally use infiniband, but failover to gigabit > totally seemlessly, even preserving open FDs) > > 3. client writes to only one server, tremendous saving of bandwidth > on the link between client and server. > > 4. self-heal checks can be performed in a more deterministic manner > since it is done by the 'primary chld' server. there are no > questions like 'what if two children try to heal together' or 'what if > no client is mounted at all' > > 5. extensions to AFR (like very-lazy replication, on close()) will be > lot easier. client submits a write to any server and forgets. > > 6. possible to implment 'transaction replay' kind of features easier > by preserving unwritten write() data with offset etc. on the server itslef > (doing such things with AFR on the client is unreliable since client can > always umount off) > > 7. on client side failover is not the only way, even 'loadbalance' > translator will be a good choice (wich takes care of not scheduling > calls to the link which is down). thus AFR will work hand-in-hand with > failover and/or loadbalancing, howoever the user prefers. (ofcourse > the loadbalance will work with its own abstraction where you can use > it just to loadbalance network links (remember somebody asking this on > the mailing list)) > > my instinct tells me there are more advantages i can list if i think > over more. > > i feel failover and loadbalancer as generic layer will add lot of > power and possiblity for creative use, and AFR leveraging on that fits > in overall nicely. > > suggestions/comments ? > > > avati > > -- > ultimate_answer_t > deep_thought (void) > { > sleep (years2secs (7500000)); > return 42; > } > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel