I can't realistically review most of this code, so I went looking for some documentation and found this. Maybe it's not the best starting point. Forgive me if I seem dense, I'd just really like to see everything spelled out very precisely, and neither this nor your original presentation quite does that for me yet: On Thu, May 03, 2018 at 02:58:00PM -0400, Jeff Layton wrote: > + * The rados_grace database is a rados object with a well-known name that > + * with which all cluster nodes can interact to coordinate grace-period > + * enforcement. > + * > + * It consists of two parts: > + * > + * 1) 2 uint64_t epoch values (stored LE) that indicate the serial number of > + * the current grace period (C) and the serial number of the grace period that Delete "that". > + * from which recovery is currently allowed (R). These are stored as object > + * data. > + * > + * 2) An omap containing a key value pair for each cluster node. The key is > + * the hostname of the node running ganesha, and the value is a byte with a > + * set of flags. > + * > + * Consider a single server epoch (E) of an individual NFS server to be the > + * period between reboots. That consists of an initial grace period and > + * a regular operation period. An epoch value of 0 is never valid. Does "epoch value" mean the same thing as "serial number" above? I assume it's something that uniquely identifies an "epoch". Also you've defined an "epoch" for a single server, it needs definition for a cluster too, right? > + * > + * The first value (C) indicates the current server epoch. The client recovery > + * db should be tagged with this value on creation, or when updating the db > + * after the grace period has been fully lifted. What's the "client recovery db"? I guess it's the per-node database of long-form client identifiers identifying clients that are allowed to reclaim state? > + * > + * The second uint64_t value (R) > in the data tells the NFS server from what > + * recovery db it is allowed to reclaim. A value of 0 in this field means that > + * we are out of the cluster-wide grace period and that no recovery is allowed. > + * > + * The omap contains a key for each host in the cluster. Typically, nodes join > + * the cluster by setting their omap key. The value of the omap is a single > + * byte that contains a set of flags that indicates their current need for a > + * grace period and whether they are locally enforcing one. Is it really just those two flags? A list of flags here would be helpful. --b. > + * > + * The grace period handling engine will update and store the flags, and it > + * can be queried to determine whether other nodes may need a grace period or > + * are enforcing. > + */ -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html