In RHS 2.1's road-map is the DRC(hereafter, cache), which has the following requirements in Docspace: NFSv3 Duplicate reply cache for non-idempotent ops, cluster aware, persistent across reboots. (performance, correctness) * For persistence across reboots, one needs to implement DRC which caches the replies in files. However, this will significantly degrade the overall performance of non-idempotent operations (write, rename, create, unlink, etc). Having an in-memory cache eliminates the overhead of having to write each reply to persistent storage, but at the obvious cost of losing DRC on crashes and reboots. AFAIK, the Linux kernel's implementation is currently in-memory only. As such, we need to evaluate the actual impact on performance and weigh it against the advantages of having a persistent cache. * For Cluster-aware DRC i.e, one where in if a server(say,A) goes down, another server (say, B) should take up the cache of the A to serve requests on behalf A. For this, both A and B should have a shared persistent storage for the DRC, along the lines of ctdb. One way of achieving shared persistent storage would be to simply use a gluster volume. * Cache writes to disk/gluster volume could be the usual two ways: write-back and write-through. a. write-back: using this would help avoid the delay in waiting for synchronous writes to the cache, which would be significant given we need to do it for every request, and to glusterfs nevertheless (1 network round-trip). This actually provides a small window for failure if a cache write is lost in network transit just after the writing server goes down. b: write-through: using this would essentially add at least one more network round-trip to every non-idempotent request. Implementing this, IMO, is not worth the performance loss incurred. We could implement the DRC this way: 1. Have the DRC turned OFF by default. 2. Implement DRC in three or five modes: * in-memory, * local disk cache(cache local to the server) and * cluster-aware cache (using glusterfs), last two of which could be write-back or write-through. 3. We also need to empirically derive an optimal default value for the cache size for each mode. Choice of data structures i have in mind: 1. For in-memory: Two tables/hashed-tables of pointers to cached replies, one sorted/hashed on {XID,Client-Hostname/IP} pair, the other on time(for LRU eviction of cached replies). Considering that n(cache look-ups) >> n(cache hits), we need the fastest look-ups possible. I need suggestions for faster data structures for look-ups. 2. For on-disk storage of cache replies, i was thinking of a per-client directory with each reply being stored in a separate file and XID being the file name. This makes is easy for retrieval of cached replies by the fail-over server(s). One problem in cluster-aware drc with this approach is that if we have two clients from the same machine connected to different servers, XIDs may collide. This can be avoided by having the server ip/fqdn appended to the XIDs as file names. Also, having to cache multiple replies in one single file would be cumbersome. We will start with in-memory implementation and proceed to further modes. I look forward to suggestions for changes and improvements on the design. Thanks & Regards, Rajesh Amaravathi, Software Engineer, GlusterFS Red Hat