Hi Luis, Thanks for your review/comments. I will take all these into consideration, and start updating the document incrementally asap. With regards, Shishir ----- Original Message ----- From: "Luis Pabon" <lpabon@xxxxxxxxxx> To: "Shishir Gowda" <sgowda@xxxxxxxxxx> Cc: gluster-devel@xxxxxxxxxx Sent: Monday, August 19, 2013 7:18:20 AM Subject: Re: Snapshot design for glusterfs volumes Hi Shishir, Thank you for sending out your paper. Here are some comments I have (written in markdown format): # Review of Online Snapshot Support for GlusterFS ## Section: Introduction * Primary use case should have a better explanation. It does not explain how the user currently compensating for not currently having the technology in their environment, nor the benefits of having the feature. * Last sentence should explain why it is the same. Why would it be? No benefits can be gained from having this feature for non-vm image environments? If not, then the name should be changed to vmsnapshots or something that discourages usage in environments other than VM image storage. ## Section: Snapshot Architecture * The architecture section does not talk about architecture, but instead focuses on certain modes of operation. Please explain how a user from either a client or something like OpenStack interface interact with the snapshots. Also describe in good detail all aspects of operation (delete,create,etc.). Describe here the concept of Barriers instead of at the end of the document. * I'm new to GlusterFS, but I am confused on what is meant by bullet #3: "The planned support is for GlusterFS Volume based snapshots...". Seems like the sentence is not finished. Do you mean "The planned support is for snapshots of GlusterFS volumes..."? Also, how is brick coherency kept across multiple AFR nodes? * Snapshot Consistency section is confusing, please reword the description. Maybe change the format to paragraphs instead of bullets * Please explain why there is a snapshot limit of 256. Are we using only one byte for tracking a snapshot id? * When the CLI executes multiple volume snapshots, is it possible to execute them in parallel? Why do they need to be serially processed? * What happens when `restore` is executed? How does the volume state change? Does the .gluster directory change in any way? * What happens when `delete` is executed? When we have the following snaps `A->B->C->D`, and we delete `B`, what happens to the state of the volume? Do the changes from `B` get merged to `A` so that it provided the dependencies needed by `C`? * Using the example above, can I branch or clone from `B` to `B'` and create a *new* volume? I am guessing that the LVM technology would probably not allow this, but maybe btrfs would. ## Section: Data Flow * This section is confusing. Why are they bullets if they read as a sequence? This seems to me more like a project requirements list than a data flow description. * What are the side effects of acquiring the cluster wide lock? What benefits/concerns should it have on the system with N nodes? * What is the average amount of time the CLI will expect to be blocked before it returns? * I am not sure if we have something like this already, but we may want to discuss the concept of a JOB manager. For example, here the CLI will send a request which may take longer than 3 secs. In such a situation, the CLI will be returned a JOB ticket number. The user can then query the JOB manager and provide the ticket number for status, or provide a callback mechanism (which is a little harder, but possible to do). In any case, I think this JOB manager falls outside the scope of this paper, but is something we should revisit if we do not already posses. * The bullet "Once barrier is on, initiate back-end snapshot." should explain in greater detail what is meant by "back-end snapshot". ## Section: CLI Interface * Each one of these commands should be explained in the architecture section in fine detail on how they affect volume state changes and side effects. ## Section: Snapshot Design * Does the amount of content in a brick affect the create, delete, list, or restore snapshot time? * The paper only describes `create` in the first part of the section. There probably should be a subsection for each of the commands supported, each describing in detail how they are planned to be implemented. * Could there be a section showing how JSON/XML interfaces would be supporting this feature? ### Subsection: Stage-1 Prepare * Are barriers on multiple bricks executed serially? What is the maximum number of bricks supported by the snapshot feature before taking an unusual amount of time to execute? Should brick barriers be done in parallel? * This again seems like a requirement list and sometimes like a sequence. Please reword section. ## Section: Barrier * Paragraph states "unless Asynchronous IO is used". How does that affect the barrier and snapshots? Paper does not describe this situation. * A description of the planned Barrier design will help understand what is meant by queuing of fops. * Will the barrier be implemented as a new xlator which will be interested on the fly when a snapshot is requested, or will it require changes to existing xlators? If it is not planned to be a xlator, should it be implemented as such to provide code isolation? * Why are `write` and `unlink` not fops to be barriered? Barriers still allow disk changes? Maybe the paper should describe why it allows certain calls to affect the disk and how these changes may or may not affect the snapshot or the volume state. ## Section: Snapshot management * Item `#2` is confusing, please reword. * Item `#3` says that individual snapshots will not be supported. If that is true, then what does `delete` do? * Item `#7` is confusing. Please reword. The paper should state why the user and developer need to know this information. * Item `#8` is confusing. Is the item stating that the user can only do certain commands on a volume snapshot restore? If this is true, are volume snapshot restores not a true volume restore where the volume is back to a previous state? What is the benefit of this feature to the user? * Item `#9` seems like an outline for the `delete` design. There needs to be more information here in greater detail as discussed above. * Item `#10` needs to describe why it is proposing that a restored snapshot is shown as a snap shot volume. Is a volume with snapshot not identified as a snap volume also? ## Section: Error Scenarios * Please reword item `#3`. ## Section: Open-ended issues * Item `#4` is confusing. Please reword. * Item `#6` suggests that snapshot volumes can be mounted. Can a snapshot *and* the latest volume be mounted at the same time? If the volume is `reverted` to a previous snapshot so that the user can inspect the volume state, I highly suggest on keeping all snapshot mounts as Read-Only. If the user wants to write to that mount, they should delete all snapshots to that point. I highly discourage this feature from dealing with merges. * Item `#8` does not describe what will happen if a re-balance is initiated. Will snaps be deleted? I do not think these constraints are a good alternative. In my opinion, the snapshot features should support all GlusterFS high availability features. * Item `#9` does not describe what the `master` volume is. Does it mean what the user cannot revert to a previous snapshot? If this is true, does that not violate the original requirement? ## Section: Upgrade/Downgrade * This section describes that snap state will be maintained in `/var/lib/glusterd...`. The paper needs to describe snapshot state in greated detail in the `Design` section. For example, what state is kept in `/var/lib/glusterd...` and what state is read from the underlying file system snapshot technology? What happens when the underlying file system snapshot technology has one state and `/var/lib/glusterd...` has another? Look forward to your reply. - Luis On 08/02/2013 02:26 AM, Shishir Gowda wrote: > Hi All, > > We propose to implement snapshot support for glusterfs volumes in release-3.6. > > Attaching the design document in the mail thread. > > Please feel free to comment/critique. > > With regards, > Shishir > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > https://lists.nongnu.org/mailman/listinfo/gluster-devel