Hi Aravinda, Looks good to me. Thanks and Regards, Kotresh H R ----- Original Message ----- > From: "Aravinda" <avishwan@xxxxxxxxxx> > To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Sent: Wednesday, April 1, 2015 2:30:49 PM > Subject: Re: Improving Geo-replication Status and Checkpoints > > Hi, > > In each node of Master Cluster one Monitor process and one or more > worker process for each brick in that node. > Monitor will have status file, which will be updated by glusterd. > Possible Status values in monitor_status file are Created, Started, > Paused, Stopped. > > Geo-rep can not be paused if monitor status is not "Started". > > Based on monitor_status, we need to hide other information from brick > status file from showing it to user. For example, If monitor status is > "Stopped", it will not make sense to show "Crawl Status" in Geo-rep > Status output. Created a Matrix of possible status values based on > Status of Monitor. VALUE represents actual unchanged value from Brick > status file. > > Monitor Status ---> Created Started Paused Stopped > -------------------------------------------------------------------------- > session VALUE VALUE VALUE VALUE > brick VALUE VALUE VALUE VALUE > node VALUE VALUE VALUE VALUE > node_uuid VALUE VALUE VALUE VALUE > volume VALUE VALUE VALUE VALUE > slave_user VALUE VALUE VALUE VALUE > slave_node N/A VALUE VALUE N/A > status Created VALUE Paused Stopped > last_synced N/A VALUE VALUE VALUE > crawl_status N/A VALUE N/A N/A > entry N/A VALUE N/A N/A > data N/A VALUE N/A N/A > meta N/A VALUE N/A N/A > failures N/A VALUE VALUE VALUE > checkpoint_completed N/A VALUE VALUE VALUE > checkpoint_time N/A VALUE VALUE VALUE > checkpoint_completed_time N/A VALUE VALUE VALUE > > Where: > session - only in XML output, Complete session URL which is used in > Create command > brick - Master Brick Node > node - Master Node > node_uuid - Master Node UUID, Only in XML output > volume - Master Volume > slave_user - Slave User > slave_node - Slave node to which respective master worker is connected. > status - Created/Initializing../Active/Passive/Faulty/Paused/Stopped > last_synced - Last synced Time > crawl_status - Hybrid/History/Changelog > entry - Number of entry ops pending(per session, resets counter if > worker restart) > data - Number of data ops pending(per session, resets counter if worker > restart) > meta - Number of meta ops pending(per session, resets counter if worker > restart) > failures - Number of failures. (If count more than 0, then action item > for admin to look in log files) > checkpoint_completed - Checkpoint Status Yes/No/ N/A > checkpoint_time - Checkpoint Set time or N/A > checkpoint_completed_time - Checkpoint Completed Time or N/A > > Along with the monitor_status, if brick status is Faulty, following > fields will be displayed as N/A. > active, paused, slave_node, crawl_status, entry, data, metadata > > Let me know your thoughts. > > -- > regards > Aravinda > > > On 02/03/2015 11:00 PM, Aravinda wrote: > > Today we discussed about Geo-rep Status design, summary of the > > discussion. > > > > - No usecase for "Deletes pending" column, should we retain it? > > - No separate column for Active/Passive. Worker can be Active/Passive > > only when worker is Stable(It can't be Faulty and Active) > > - Rename "Not Started" status as "Created" > > - Checkpoint columns will be retained in the Status output till we > > support Multiple checkpoints. Three columns instead of Single > > column(Completed, Checkpoint time and Completion time) > > - Still we have confusion about "Files Pending" and "Files Synced", > > What numbers it has to show. Georep can't map the number to exact > > count on disk. > > Venky suggested to show Entry, Data and Metadata pending as three > > columns. (Remove "Files Pending" and "Files Synced") > > - Rename "Files Skipped" to "Failures" > > > > Status output proposed: > > ----------------------- > > MASTER NODE - Master node hostname/IP > > MASTER VOL - Master volume name > > MASTER BRICK - Master brick path > > SLAVE USER - Slave user to which geo-rep is established. > > SLAVE - Slave host and Volume name(HOST::VOL format) > > STATUS - Created/Initializing../Started/Active/Passive/Stopped/Faulty > > LAST SYNCED - Last synced time(Based on stime xattr) > > CRAWL STATUS - Hybrid/History/Changelog > > CHECKPOINT STATUS - Yes/No/ N/A > > CHECKPOINT TIME - Checkpoint Set Time > > CHECKPOINT COMPLETED - Checkpoint Completion Time > > > > Not yet decided > > --------------- > > FILES SYNCD - Number of Files Synced > > FILES PENDING - Number of Files Pending > > DELETES PENDING- Number of Deletes Pending > > FILES SKIPPED - Number of Files skipped > > ENTRIES - Create/Delete/MKDIR/RENAME etc > > DATA - Data operations > > METADATA - SETATTR, SETXATTR etc > > > > Let me know your suggestions. > > > > -- > > regards > > Aravinda > > > > > > On 02/02/2015 04:51 PM, Aravinda wrote: > >> Thanks Sahina, replied inline. > >> > >> -- > >> regards > >> Aravinda > >> > >> On 02/02/2015 12:55 PM, Sahina Bose wrote: > >>> > >>> On 01/28/2015 04:07 PM, Aravinda wrote: > >>>> Background > >>>> ---------- > >>>> We have `status` and `status detail` commands for GlusterFS > >>>> geo-replication, This mail is to fix the existing issues in these > >>>> command outputs. Let us know if we need any other columns which > >>>> helps users to get meaningful status. > >>>> > >>>> Existing output > >>>> --------------- > >>>> Status command output > >>>> MASTER NODE - Master node hostname/IP > >>>> MASTER VOL - Master volume name > >>>> MASTER BRICK - Master brick path > >>>> SLAVE - Slave host and Volume name(HOST::VOL format) > >>>> STATUS - Stable/Faulty/Active/Passive/Stopped/Not Started > >>>> CHECKPOINT STATUS - Details about Checkpoint completion > >>>> CRAWL STATUS - Hybrid/History/Changelog > >>>> > >>>> Status detail - > >>>> MASTER NODE - Master node hostname/IP > >>>> MASTER VOL - Master volume name > >>>> MASTER BRICK - Master brick path > >>>> SLAVE - Slave host and Volume name(HOST::VOL format) > >>>> STATUS - Stable/Faulty/Active/Passive/Stopped/Not Started > >>>> CHECKPOINT STATUS - Details about Checkpoint completion > >>>> CRAWL STATUS - Hybrid/History/Changelog > >>>> FILES SYNCD - Number of Files Synced > >>>> FILES PENDING - Number of Files Pending > >>>> BYTES PENDING - Bytes pending > >>>> DELETES PENDING - Number of Deletes Pending > >>>> FILES SKIPPED - Number of Files skipped > >>>> > >>>> > >>>> Issues with existing status and status detail: > >>>> ---------------------------------------------- > >>>> > >>>> 1. Active/Passive and Stable/faulty status is mixed up - Same > >>>> column is used to show both active/passive status as well as > >>>> Stable/faulty status. If Active node goes faulty then by looking at > >>>> the status it is difficult to understand Active node is faulty or > >>>> the passive one. > >>>> 2. Info about last synced time, unless we set checkpoint it is > >>>> difficult to understand till what time data is synced to slave. For > >>>> example, if a admin want's to know all the files synced which are > >>>> created 15 mins ago, it is not possible without setting checkpoint. > >>>> 3. Wrong values in metrics. > >>>> 4. When multiple bricks present in same node. Status shows Faulty > >>>> when one of the worker is faulty in that node. > >>>> > >>>> Changes: > >>>> -------- > >>>> 1. Active nodes will be prefixed with * to identify it is a active > >>>> node.(In xml output active tag will be introduced with values 0 or 1) > >>>> 2. New column will show the last synced time, which minimizes the > >>>> use of checkpoint feature. Checkpoint status will be shown only in > >>>> status detail. > >>>> 3. Checkpoint Status is removed, Separate Checkpoint command will > >>>> be added to gluster cli(We can introduce multiple Checkpoint > >>>> feature with this change) > >>>> 4. Status values will be "Not > >>>> Started/Initializing/Started/Faulty/Stopped". Stable is changed to > >>>> "Started" > >>>> 5. Slave User column will be introduced to show to which user > >>>> geo-rep session is established.(Useful in Non root geo-rep) > >>>> 6. Bytes pending column will be removed. It is not possible to > >>>> identify the delta without simulating sync. For example, we are > >>>> using rsync to sync data from master to slave, If we need to know > >>>> how much data to be transferred then we have to run the rsync > >>>> command with --dry-run flag before running actual command. With > >>>> tar-ssh we have to stat all the files which are identified to be > >>>> synced to calculate the total bytes to be synced. Both are costly > >>>> operations which degrades the geo-rep performance.(In Future we can > >>>> include these columns) > >>>> 7. Files pending, Synced, deletes pending are only session > >>>> information of the worker, these numbers will not match with the > >>>> number of files present in Filesystem. If worker restarts, counter > >>>> will reset to zero. When worker restarts, it logs previous session > >>>> stats before resetting it. > >>>> 8. Files Skipped is persistent status across sessions, Shows exact > >>>> count of number of files skipped(Can get list of GFIDs skipped from > >>>> log file) > >>>> 9. "Deletes Pending" column can be removed? > >>> > >>> Is there any way to know if there are errors syncing any of the > >>> files? Which column would that reflect in? > >> "Skipped" Column shows number of files failed to sync to Slave. > >> > >>> Is the last synced time - the least of the synced time across the > >>> nodes? > >> Status output will have one entry for each brick, so we are planning > >> to display last synced time from that brick. > >>> > >>> > >>>> > >>>> Example output > >>>> > >>>> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER > >>>> SLAVE STATUS LAST SYNCED CRAWL > >>>> ---------------------------------------------------------------------------------------------------------------- > >>>> > >>>> * fedoravm1 gvm /gfs/b1 root fedoravm3::gvs > >>>> Started 2014-05-10 03:07 pm Changelog > >>>> fedoravm2 gvm /gfs/b2 root fedoravm4::gvs > >>>> Started 2014-05-10 03:07 pm Changelog > >>>> > >>>> New Status columns > >>>> > >>>> ACTIVE_PASSIVE - * if Active else none. > >>>> MASTER NODE - Master node hostname/IP > >>>> MASTER VOL - Master volume name > >>>> MASTER BRICK - Master brick path > >>>> SLAVE USER - Slave user to which geo-rep is established. > >>>> SLAVE - Slave host and Volume name(HOST::VOL format) > >>>> STATUS - Stable/Faulty/Active/Passive/Stopped/Not Started > >>>> LAST SYNCED - Last synced time(Based on stime xattr) > >>>> CHECKPOINT STATUS - Details about Checkpoint completion > >>>> CRAWL STATUS - Hybrid/History/Changelog > >>>> FILES SYNCD - Number of Files Synced > >>>> FILES PENDING - Number of Files Pending > >>>> DELETES PENDING- Number of Deletes Pending > >>>> FILES SKIPPED - Number of Files skipped > >>>> > >>>> > >>>> XML output > >>>> active > >>>> master_node > >>>> master_node_uuid > >>>> master_brick > >>>> slave_user > >>>> slave > >>>> status > >>>> last_synced > >>>> crawl_status > >>>> files_syncd > >>>> files_pending > >>>> deletes_pending > >>>> files_skipped > >>>> > >>>> > >>>> Checkpoints > >>>> =========== > >>>> New set of Gluster CLI commands will be introduced for Checkpoints. > >>>> > >>>> gluster volume geo-replication <VOLNAME> > >>>> <SLAVEHOST>::<SLAVEVOL> checkpoint create <NAME> <DATE> > >>>> gluster volume geo-replication <VOLNAME> <SLAVEHOST>::<SLAVEVOL> > >>>> checkpoint delete <NAME> > >>>> gluster volume geo-replication <VOLNAME> > >>>> <SLAVEHOST>::<SLAVEVOL> checkpoint delete all > >>>> gluster volume geo-replication <VOLNAME> > >>>> <SLAVEHOST>::<SLAVEVOL> checkpoint status [<NAME>] > >>>> gluster volume geo-replication <VOLNAME> checkpoint status # > >>>> For all geo-rep sessions for that volume > >>>> gluster volume geo-replication checkpoint status # For all > >>>> geo-rep sessions for all volumes > >>>> > >>>> > >>>> Checkpoint Status: > >>>> > >>>> SESSION NAME Completed Checkpoint > >>>> Time Completion Time > >>>> ----------------------------------------------------------------------------------------- > >>>> > >>>> gvm->root@fedoravm3::gvs Chk1 Yes 2014-11-30 11:30 pm > >>>> 2014-12-01 02:30 pm > >>>> gvm->root@fedoravm3::gvs Chk2 No 2014-12-01 10:00 pm N/A > >>> > >>> Can the time information have the timezone information as well? Or > >>> is this UTC time? > >>> (Same comment for last synced time) > >> Sure. Will have UTC time in Status output. > >>> > >>>> > >>>> XML output: > >>>> session > >>>> master_uuid > >>>> name > >>>> completed > >>>> checkpoint_time > >>>> completion_time > >>>> > >>>> > >>>> -- > >>>> regards > >>>> Aravinda > >>>> _______________________________________________ > >>>> Gluster-devel mailing list > >>>> Gluster-devel@xxxxxxxxxxx > >>>> http://www.gluster.org/mailman/listinfo/gluster-devel > >>> > >> > >> _______________________________________________ > >> Gluster-devel mailing list > >> Gluster-devel@xxxxxxxxxxx > >> http://www.gluster.org/mailman/listinfo/gluster-devel > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel