Re: Performance improvements to Gluster Geo-replication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/31/2015 03:17 AM, Aravinda wrote:
Following Changes/ideas identified to improve the Geo-replication
Performance. Please add your ideas/issues to the list

1. Entry stime and Data/Meta stime
----------------------------------
Now we use only one xattr to maintain the state of sync, called
stime. When a Geo-replication worker restarts, it starts from that
stime and sync files.

     get_changes from <STIME> to <CURRENT TIME>
         perform <ENTRY> operations
         perform <META> operations
         perform <DATA> operations

If data operation is failed worker crashes and restarts and reprocess
the changelogs again. Entry, Meta and Data operations will be
retried. If we maintain entry_stime seperately then we can avoid
reprocessing of entry operations which are completed previously.

This seems like a good thing to do.

Here is something more that could be done (I am not well aware of geo-rep internals so maybe this cannot be done), - Why not maintain a 'mark', till which even ENTRY/META operations are performed, so that even when failures occur in ENTRY/META operation queue, we need to restart from the mark and not all the way from the beginning STIME.

I am not sure where such a 'mark' can be maintained, unless the processed get_changes are ordered and written to disk, or ordered idempotently in memory each time.



2. In case of Rsync/Tar failure, do not repeat Entry Operations
---------------------------------------------------------------
In case of Rsync/Tar failures, Changelogs are reprocessed
again. Instead re trigger only Rsync/Tar job for those list of files
which are failed.

(this is more for my understanding)
I assume that this retry is within the same STIME -> NOW1 period. IOW, if the re-trigger of the tar/rsync is going to occur in the next sync interval, then I would assume that ENTRY/META for NOW1 -> NOW would be repeated, correct? The same is true for the above as well, i.e all ENTRY/META operations that are completed between STIME and NOW1 is not repeated, but events between NOW1 to NOW is, correct?



3. Better Rsync Queue
---------------------
Now Geo-rep has a Rsync/Tar queue called PostBox. Sync
jobs(configurable, default is 3) will empty the Post Box and feeds it
to Rsync/Tar process. Second sync job may not find any items to sync,
only first job may overloaded. To avoid this, introduce a batch size
to PostBox so that each sync jobs gets equal number of files to sync.

Do you want to consider round-robin of entries to the sync jobs, something that we did in rebalance, instead of a batch size?

A batch size can again be consumed by a single sync process, and the next batch by the next one so on. Maybe a round-robin distribution of files to sync from the post-box to each sync thread may help.



4. Handling the Tracebacks
--------------------------
Collect the list of Tracebacks which are not yet handled, and look for
posibility of handling it in run time. With this, workers crash will
be minimized so that we can avoid initializing and changelogs
reprocess efforts.


5. SSH failure handling
-----------------------
If Slave node goes down, the Master worker connected to it will go to
Faulty and restarts. If we can handle SSH failures intelligently, we
can reestablish the SSH connection instead of restarting Geo-rep
worker. With this change, Active/Passive switch for Network failures
can be avoided.


6. On Worker restart, Utilizing Changelogs which are in .processing
directory
--------------------------------------------------------------------
On Worker restart, Start time for Geo-rep is previously updated
stime. Geo-rep re-parses the Changelogs from Brick backend to Working
directory even though those changelogs parsed previously but stime is
not updated due to failures in sync.

     1. On Geo-rep restart, Delete all files in .processing/cache and
     move all the changelogs available in .processing directory to
     .processing/cache
     2. In Changelog API, look for Changelog file name in cache before
     parsing it.
     3. If available in cache, move it to .processing
     4. else parse it and generate parsed changelog in .processing


I did not understand the above, but that's probably just me as I am not fully aware of change log process yet :)
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux