Re: Performance improvements to Gluster Geo-replication

Shyam <srangana@xxxxxxxxxx> · Mon, 31 Aug 2015 09:47:55 -0400

On 08/31/2015 03:17 AM, Aravinda wrote:
Following Changes/ideas identified to improve the Geo-replication
Performance. Please add your ideas/issues to the list

1. Entry stime and Data/Meta stime
----------------------------------
Now we use only one xattr to maintain the state of sync, called
stime. When a Geo-replication worker restarts, it starts from that
stime and sync files.

     get_changes from <STIME> to <CURRENT TIME>
         perform <ENTRY> operations
         perform <META> operations
         perform <DATA> operations

If data operation is failed worker crashes and restarts and reprocess
the changelogs again. Entry, Meta and Data operations will be
retried. If we maintain entry_stime seperately then we can avoid
reprocessing of entry operations which are completed previously.

This seems like a good thing to do.

Here is something more that could be done (I am not well aware of 
geo-rep internals so maybe this cannot be done),
- Why not maintain a 'mark', till which even ENTRY/META operations are 
performed, so that even when failures occur in ENTRY/META operation 
queue, we need to restart from the mark and not all the way from the 
beginning STIME.

I am not sure where such a 'mark' can be maintained, unless the 
processed get_changes are ordered and written to disk, or ordered 
idempotently in memory each time.

2. In case of Rsync/Tar failure, do not repeat Entry Operations
---------------------------------------------------------------
In case of Rsync/Tar failures, Changelogs are reprocessed
again. Instead re trigger only Rsync/Tar job for those list of files
which are failed.

(this is more for my understanding)
I assume that this retry is within the same STIME -> NOW1 period. IOW, 
if the re-trigger of the tar/rsync is going to occur in the next sync 
interval, then I would assume that ENTRY/META for NOW1 -> NOW would be 
repeated, correct? The same is true for the above as well, i.e all 
ENTRY/META operations that are completed between STIME and NOW1 is not 
repeated, but events between NOW1 to NOW is, correct?

3. Better Rsync Queue
---------------------
Now Geo-rep has a Rsync/Tar queue called PostBox. Sync
jobs(configurable, default is 3) will empty the Post Box and feeds it
to Rsync/Tar process. Second sync job may not find any items to sync,
only first job may overloaded. To avoid this, introduce a batch size
to PostBox so that each sync jobs gets equal number of files to sync.

Do you want to consider round-robin of entries to the sync jobs, 
something that we did in rebalance, instead of a batch size?

A batch size can again be consumed by a single sync process, and the 
next batch by the next one so on. Maybe a round-robin distribution of 
files to sync from the post-box to each sync thread may help.

4. Handling the Tracebacks
--------------------------
Collect the list of Tracebacks which are not yet handled, and look for
posibility of handling it in run time. With this, workers crash will
be minimized so that we can avoid initializing and changelogs
reprocess efforts.

5. SSH failure handling
-----------------------
If Slave node goes down, the Master worker connected to it will go to
Faulty and restarts. If we can handle SSH failures intelligently, we
can reestablish the SSH connection instead of restarting Geo-rep
worker. With this change, Active/Passive switch for Network failures
can be avoided.

6. On Worker restart, Utilizing Changelogs which are in .processing
directory
--------------------------------------------------------------------
On Worker restart, Start time for Geo-rep is previously updated
stime. Geo-rep re-parses the Changelogs from Brick backend to Working
directory even though those changelogs parsed previously but stime is
not updated due to failures in sync.

     1. On Geo-rep restart, Delete all files in .processing/cache and
     move all the changelogs available in .processing directory to
     .processing/cache
     2. In Changelog API, look for Changelog file name in cache before
     parsing it.
     3. If available in cache, move it to .processing
     4. else parse it and generate parsed changelog in .processing

I did not understand the above, but that's probably just me as I am not 
fully aware of change log process yet :)
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel