On 31/03/2010 06:14, Tom Lanyon wrote: > On 31/03/2010, at 2:36 PM, Raghavendra G wrote: > > >> Current design of write-behind acknowledges writes (to applications) even >> when they've not hit the disk. Can you please explain how this design is >> different (if it is different) from the idea you've explained above? >> > Is this gluster method of write-behind acknowledging the writes before they've left the client? The method Ed was describing is that the write is acknowledged only once its reached the server (and a defined number of replication targets), even though it hasn't hasn't been written to disk on the server yet. This is a hybrid approach which safeguards against client power failure before the write (which has already been acknowledged) gets pushed to any servers, but improves performance over end-to-end write-through as it does not wait for the write acknowledgement from the physical disk(s). > > Agreed. So assuming say one client talking over network to a 100 server replicas (absurd for the purposes of clarification) Our safety levels are: 1) ACK sent as soon as app sends data to the client OS and before it's even left the client machine. Complete data loss possible if the client is unplugged/dies at that instance. (weak / fast) 2) ACK sent only once data sent to all 100 replicas AND data written to disk. Data loss only possible if all replicas are lost. (strong / slowest) 3) ACK sent once X server machines have received the request (to ram). Data loss possible if all server machines lost before they write the request to disk. Good compromise of speed vs reliability guarantees In the simplest situation of a single server then we have roughly achieved the effect of moving the writeback cache to the serverside. In the case of multiple servers with exactly equal latency to the client then we have roughly achieved the same as moving writeback cache to serverside on all servers. In the case of non equal latency between client and server, or with server side replication, or with very busy servers then we gain a performance improvement due to the lower latency before the ACK sent to the client I thought this was a very clever technique and actually very compatible with the gluster philosophy (independent bricks) Ed W