On Tuesday, March 1, 2016 1:15:06 PM, Kaushal M wrote: > On Tue, Mar 1, 2016 at 12:37 PM, Prasanna Kumar Kalever > <pkalever@xxxxxxxxxx> wrote: > > Hello Gluster, > > > > > > Introducing a new file based snapshot feature in gluster which is based on > > reflinks feature which will be out from xfs in a couple of months > > (downstream) > > > > > > what is a reflink ? > > > > You might have surely used softlinks and hardlinks everyday! > > > > Reflink supports transparent copy on write, unlike soft/hardlinks which if > > useful for snapshotting, basically reflink points to same data blocks > > that are used by actual file (blocks are common to real file and a > > reflink file hence space efficient), they use different inode numbers > > hence they can have different permissions to access same data blocks, > > although they may look similar to hardlinks but are more space efficient > > and can handle all operations that can be performed on a regular file, > > unlike hardlinks that are limited to unlink(). > > > > which filesystem support reflink ? > > I think its Btrfs who put it for the first time, now xfs trying hard to > > make them available, in the future we can see them in ext4 as well > > > > You can get a feel of reflinks by following tutorial > > https://pkalever.wordpress.com/2016/01/22/xfs-reflinks-tutorial/ > > > > > > POC in gluster: https://asciinema.org/a/be50ukifcwk8tqhvo0ndtdqdd?speed=2 > > > > > > How we are doing it ? > > Currently we don't have a specific system-call that gives handle to > > reflinks, so I decided to go with ioctl call with XFS_IOC_CLONE command. > > > > In POC I have used setxattr/getxattr to create/delete/list the snapshot. > > Restore feature will use setxattr as well. > > > > We can have a fop although Fuse does't understand it, we will manage with > > a setxattr at Fuse mount point and again from client side it will be a > > fop till the posix xlator then as a ioctl to the underlying filesystem. > > Planing to expose APIs for create, delete, list and restore. > > > > Are these snapshots Internal or external? > > We will have a separate file each time we create a snapshot, obviously the > > snapshot file will have a different inode number and will be a readonly, > > all these files are maintained in the ".fsnap/ " directory which is > > maintained by the parent directory where the snapshot-ted/actual file > > resides, therefore they will not be visible to user (even with ls -a > > option, just like USS). > > > > *** We can always restore to any snapshot available in the list and the > > best part is we can delete any snapshot between snapshot1 and snapshotN > > because all of them are independent *** > > > > It is applications duty to ensure the consistency of the file before it > > tries to create a snapshot, say in case of VM file snapshot it is the > > hyper-visor that should freeze the IO and then request for the snapshot > > > > > > > > Integration with gluster: (Initial state, need more investigation) > > > > Quota: > > Since the snapshot files resides in ".fsnap/" directory which is > > maintained by the same directory where the actual file exist, it falls in > > the same users quota :) > > > > DHT: > > As said the snapshot files will resides in the same directory where the > > actual file resides may be in a ".fsnap/" directory > > > > Re-balancing: > > Simplest solution could be, copy the actual file as whole copy then for > > snapshotfiles rsync only delta's and recreate snapshots history by > > repeating snapshot sequence after each snapshotfile rsync. > > > > AFR: > > Mostly will be same as write fop (inodelk's and quorum's). There could be > > no way to recover or recreate a snapshot on node (brick to be precise) > > which was down while taking snapshot and comes back later in time. > > > > Disperse: > > Mostly take the inodelk and snapshot the file, on each of the bricks should > > work. > > > > Sharding: > > Assume we have a file split into 4 shards. If the fop for take snapshot is > > sent to all the subvols having the shards, it would be sufficient. All > > shards will have the snapshot for the state of the shard. > > List of snap fop should be sent only to the main subvol where shard 0 > > resides. > > Delete of a snap should be similar to create. > > Restore would be a little difficult because metadata of the file needs to > > be updated in shard xlator. > > <Needs more investigation> > > Also in case of sharding, the bricks have gfid based flat filesystem. Hence > > the snaps created will also be in the shard directory, hence quota is not > > straight forward and needs additional work in this case. > > > > > > How can we make it better ? > > Discussion page: http://pad.engineering.redhat.com/kclYd9TPjr > > This link is not accessible externally. Could you move the contents to > a public location? Thanks Kaushal, I have copied it to https://public.pad.fsfe.org/p/Snapshots_in_glusterfs lets use this from now. -Prasanna > > > > > > > Thanks to "Pranith Kumar Karampuri", "Raghavendra Talur", "Rajesh Joseph", > > "Poornima Gurusiddaiah" and "Kotresh Hiremath Ravishankar" > > for all initial discussions. > > > > > > -Prasanna > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxxx > > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel