On Wed, Mar 31, 2010 at 1:40 PM, Ed W <lists at wildgooses.com> wrote: > On 31/03/2010 06:14, Tom Lanyon wrote: > >> On 31/03/2010, at 2:36 PM, Raghavendra G wrote: >> >> >> >>> Current design of write-behind acknowledges writes (to applications) even >>> when they've not hit the disk. Can you please explain how this design is >>> different (if it is different) from the idea you've explained above? >>> >>> >> Is this gluster method of write-behind acknowledging the writes before >> they've left the client? The method Ed was describing is that the write is >> acknowledged only once its reached the server (and a defined number of >> replication targets), even though it hasn't hasn't been written to disk on >> the server yet. This is a hybrid approach which safeguards against client >> power failure before the write (which has already been acknowledged) gets >> pushed to any servers, but improves performance over end-to-end >> write-through as it does not wait for the write acknowledgement from the >> physical disk(s). >> >> >> > > > Agreed. So assuming say one client talking over network to a 100 server > replicas (absurd for the purposes of clarification) > > Our safety levels are: > > 1) ACK sent as soon as app sends data to the client OS and before it's even > left the client machine. Complete data loss possible if the client is > unplugged/dies at that instance. (weak / fast) > This functionality can be achieved in glusterfs by loading write-behind translator on client. > > 2) ACK sent only once data sent to all 100 replicas AND data written to > disk. Data loss only possible if all replicas are lost. (strong / slowest) > write-behind is not needed in this case. > > 3) ACK sent once X server machines have received the request (to ram). > Data loss possible if all server machines lost before they write the > request to disk. Good compromise of speed vs reliability guarantees > This functionality can be achieved by loading write-behind translator on server-side (on top of posix translator). > > > In the simplest situation of a single server then we have roughly achieved > the effect of moving the writeback cache to the serverside. In the case of > multiple servers with exactly equal latency to the client then we have > roughly achieved the same as moving writeback cache to serverside on all > servers. In the case of non equal latency between client and server, or > with server side replication, or with very busy servers then we gain a > performance improvement due to the lower latency before the ACK sent to the > client > > I thought this was a very clever technique and actually very compatible > with the gluster philosophy (independent bricks) > > Ed W > regards, -- Raghavendra G