hi,
Doing file-snapshots is becoming important in the context of providing virtual block in containers so that we can take a snapshot of the block device and switch to different snapshots etc. So there have been some attempts at the design for such solutions. This is a very early look at some of the solutions proposed so far. Please let us know what you think about these and feel free to add any more solutions you may have for this problem.Assumptions:
- Snap a single file
- File is accessed by a single client (VM images, block store for container etc.)
As there is a single client that accesses the file/image, the file read/write (or other modification FOPs) can act on a version number of the file (read as a part of lookup say, and communicated to other FOPs as a part of the xdata).
1) Doing file snapshot using shards: (This is suggested by shyam, tried to keep the text as is)
If a block for such a file is written to with a higher version then the brick xlators can perform a block copy and then change the new block to the new version, and let the older version be as is.
This means, to snap such a file, just the first shard needs a higher version # and the client that is operating on this file needs to be updated with this version (mostly the client would be the one that is taking the snap, but even other wise). To update the client we can leverage the granted lease, by revoking the same, and forcing the client to reacquire the lease by visiting the first shard (if we need to coordinate the client writes post the snap this maybe sort of a must).
Anyway, bottom line is, a shard does not know a snap is taken, rather when a data modification operation is sent to the shard, it then acts on preserving the older block.
This leaves blocks with various versions on disk, and when a older snap (version) is deleted, then the corresponding blocks are freed.
A sparse block for a version never exists in this method, i.e when taking a snap, if a shard did not exist, then there is no version for it that is preserved, and hence it remains a empty/sparse block etc.
Pros: good distribution of the shards across different servers and efficient usage of the space available
Cons: Difficult to give data locality for the applications that may demand it.
This is sort of inspired from granular data self-heal idea we wanted to implement in afr, where we logically represent each block/shard used in the file by a bitmap stored either as an xattr or written to a metafile. So there is no physical division of the file into different shards. When a snapshot is taken, a new sparsefile is created of same size as before, new writes on the file are redirected to this file instead of the original file, thus preserving the old file. When a write is performed on this file, we mark which block is going to be written, copy out this block from older shard, overwrite the buffer and then write to the new version and mark the block as used either in xattr/metafile.
Pros: Easier to give data locality for the applications that may demand it.
Cons: in-efficient usage of the space available, we may end up with uneven usage among different servers in the cluster.
3) Doing filesnapshots by using reflink functionality given by the underlying FS:
When a snapshot request comes, we just do a reflink of the earlier file to the latest version and new writes are redirected to this new version of the file.
Pros: Easiest to implement among all the three, easier to give data locality for the applications that may demand it.
Cons: FS specific, i.e. not going to work on disk Filesystems that don't support file-snapshots, this too has the same problem as we have in 2) above i.e. in-efficient usage of the space available, we may end up with uneven usage among different servers in the cluster.
--
Pranith
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel