Hi, Today we discussed about GlusterFS backup API, our plan is to provide a tool/api to get list of changed files(Full/incremental) Participants: Me, Kotresh, Ajeet, Shilpa Thanks to Paul Cuzner for providing inputs about pre and post hooks available in backup utilities like NetBackup. Initial draft: ============== Case 1 - Registered Consumer ---------------------------- Consumer application has to register by giving a session name. glusterbackupapi register <sessionname> <host> <volume> When the following command run for the first time, it will do full scan. next onwards it does incremental. Start time for incremental is last backup time, endtime will be current time. glusterbackupapi <sessionname> --out-file=out.txt --out-file is optional argument, default output file name is `output.txt`. Output file will have file paths. Case 2 - Unregistered Consumer ----------------------------- Start time and end time information will not be remembered, every time consumer has to send start time and end time if incremental. For Full backup, glusterbackupapi full <host> <volume> --out-file=out.txt For Incremental backup, glusterbackupapi inc <host> <volume> <STARTTIME> <ENDTIME> --out-file=out.txt where STARTTIME and ENDTIME are in unix timestamp format. Technical overview ================== 1. Using host and volume name arguments, it fetches volume info and volume status to get the list of up bricks/nodes. 2. Executes brick/node agent to get required details from brick. (TBD: communication via RPC/SSH/gluster system:: execute) 3. If full scan, brick/node agent will gets list of files from that brick backend and generates output file. 4. If incremental, it calls Changelog History API, gets distinct GFID's list and then converts each GFID to path. 5. Generated output files from each brick node will be copied to initiator node. 6. Merges all the output files from bricks and removes duplicates. 7. In case of session based access, session information will be saved by each brick/node agent. Issues/Challenges ================= 1. If timestamp different in gluster nodes. We are assuming, in a cluster TS will remain same. 2. If a brick is down, how to handle? We are assuming, all the bricks should be up to initiate backup(atleast one from each replica) 3. If changelog not available, or broken in between start time and end time, then how to get the incremental files list. As a prerequisite, changelog should be enabled before backup. 4. GFID to path conversion, using `find -samefile` or using `glusterfs.pathinfo` xattr on aux-gfid-mount. 5. Deleted files, if we get GFID of a deleted file from changelog how to find path. Do backup api requires deleted files list? 6. Storing session info in each brick nodes. 7. Communication channel between nodes, RPC/SSH/gluster system:: execute... etc? Kotresh, Ajeet, Please add if I missed any points. -- regards Aravinda |
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel