On Wed, 2018-05-16 at 17:33 -0400, J. Bruce Fields wrote: > I can't realistically review most of this code, so I went looking for > some documentation and found this. Maybe it's not the best starting > point. Forgive me if I seem dense, I'd just really like to see > everything spelled out very precisely, and neither this nor your > original presentation quite does that for me yet: > Thanks for looking! Yes, the comments here are a mess. I'll clean them up before the next posting. Maybe I'll just transfer this to a RST doc and refer to it in the comments. > On Thu, May 03, 2018 at 02:58:00PM -0400, Jeff Layton wrote: > > + * The rados_grace database is a rados object with a well-known name that > > + * with which all cluster nodes can interact to coordinate grace-period > > + * enforcement. > > + * > > + * It consists of two parts: > > + * > > + * 1) 2 uint64_t epoch values (stored LE) that indicate the serial number of > > + * the current grace period (C) and the serial number of the grace period that > > Delete "that". > > > + * from which recovery is currently allowed (R). These are stored as object > > + * data. > > + * > > + * 2) An omap containing a key value pair for each cluster node. The key is > > + * the hostname of the node running ganesha, and the value is a byte with a > > + * set of flags. > > + * > > + * Consider a single server epoch (E) of an individual NFS server to be the > > + * period between reboots. That consists of an initial grace period and > > + * a regular operation period. An epoch value of 0 is never valid. > > Does "epoch value" mean the same thing as "serial number" above? I > assume it's something that uniquely identifies an "epoch". > Yes. > Also you've defined an "epoch" for a single server, it needs definition > for a cluster too, right? I'll do that. Basically the epoch is a cluster-wide property. It's just that with a single server, you have a trivial cluster of one host. > > + * > > + * The first value (C) indicates the current server epoch. The client recovery > > + * db should be tagged with this value on creation, or when updating the db > > + * after the grace period has been fully lifted. > > What's the "client recovery db"? I guess it's the per-node database of > long-form client identifiers identifying clients that are allowed to > reclaim state? > Yes, exactly. > > + * > > + * The second uint64_t value > > (R) > > > in the data tells the NFS server from what > > + * recovery db it is allowed to reclaim. A value of 0 in this field means that > > + * we are out of the cluster-wide grace period and that no recovery is allowed. > > + * > > + * The omap contains a key for each host in the cluster. Typically, nodes join > > + * the cluster by setting their omap key. The value of the omap is a single > > + * byte that contains a set of flags that indicates their current need for a > > + * grace period and whether they are locally enforcing one. > > Is it really just those two flags? A list of flags here would be > helpful. > Yes, just those two flags at this time. I'll plan to list them in a more tabular way. > > + * > > + * The grace period handling engine will update and store the flags, and it > > + * can be queried to determine whether other nodes may need a grace period or > > + * are enforcing. > > + */ Thanks for the review! -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html