Re: no progress in geo-replication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Dietmar,


I am very interested in helping you with that geo-replication, since we also have a setup with geo-replication that is crucial for the

backup procedure. I just had a quick look at this and for the moment, I just can suggest:

is there any suitable setting in the gluster-environment which would take influence on the speed of the processing (current settings attached) ?
gluster volume geo-replication mvol1 gl-slave-05-int::svol  config sync_jobs  9


in order to increase the number of rsync processes.

Furthermore, taken from https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/recommended_practices3


Performance Tuning

When the following option is set, it has been observed that there is an increase in geo-replication performance. On the slave volume, run the following command:

# gluster volume set SLAVE_VOL batch-fsync-delay-usec 0

Can you verify that the changelog-files are consumed?


Regards,

Felix

On 03/03/2021 17:28, Dietmar Putz wrote:

Hi,

I'm having a problem with geo-replication. A short summary...
About two month ago I have added two further nodes to a distributed replicated volume. For that purpose I have stopped the geo-replication, added two nodes on mvol and svol and started a rebalance process on both sides. Once the rebalance process was finished I startet the geo-replication again.

After a few days and beside some Unicode Errors the status of the new added brick changed from hybrid crawl to history crawl. Since then no progress, no files / directories have been created on svol for a couple of days.

Looking for a possible reason I recognized that there is was /var/log/glusterfs/geo-replication-slaves/mvol1_gl-slave-01-int_svol1 directory on the new added slave nodes.
Obviously I forgot to add the new svol node IP addresses on all master's /etc/hosts. After fixing that I did the '... execute gsec_create' and '...create push-pem force' command again and corresponding directory were created. Geo-replication started normal, all active sessions were in history crawl (as shown below) and for a short while some data were transfered to svol. But for about a week nothing had changed on svol, 0 byte transferred.

Meanwhile i have deleted (without reset-sync-time) and recreated the geo-rep session. the current status is as shown below but without any last_synced date.
an entry like "last_synced_entry": 1609283145 is still visible in /var/lib/glusterd/geo-replication/mvol1_gl-slave-01-int_svol1/*status and changelog files are continously created in /var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/<brick>/.processing.

Short time ago i changed log_level to DEBUG for a moment. Unfortunately I got an 'EOFError: Ran out of input' in gsyncd.log and rebuild of .processing starts from beginning. 
But one of the first very long lines in gsyncd.log looks like :

[2021-03-03 11:59:39.503881] D [repce(worker /brick1/mvol1):215:__call__] RepceClient: call 9163:139944064358208:1614772779.4982471 history_getchanges -> ['/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history/.processing/CHANGELOG.1609280278',...

1609280278 means Tuesday, December 29, 2020 10:17:58 PM and would somehow fit to the last_synced date.

However, I got nearly 300k files in <brick>/.history/.processing and in in log/trace it seems that any file in <brick>/.history/.processing will be processed and transferred to <brick>/.processing.
My questions so far...
first of all, is everything still ok with this geo-replication ?
do i have to wait until all changelog files in <brick>/.history/.processing are processed until transfers to svol start ?
what happens if any other error appears in geo-replication while these changelog files are processed resp. crawl status is history crawl ... does the entire process starts from the beginning ? would a checkpiont be helpful...for future decisions...?
is there any suitable setting in the gluster-environment which would take influence on the speed of the processing (current settings attached) ?


I hope someone can help...

best regards
dietmar



[ 15:17:47 ] - root@gl-master-01  /var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history $ls .processing/ | wc -l
294669

[ 12:56:31 ] - root@gl-master-01  ~ $gluster volume geo-replication mvol1 gl-slave-01-int::svol1 status

MASTER NODE         MASTER VOL    MASTER BRICK     SLAVE USER SLAVE                     SLAVE NODE         STATUS     CRAWL STATUS     LAST_SYNCED
----------------------------------------------------------------------------------------------------------------------------------------------------
gl-master-01-int    mvol1         /brick1/mvol1    root gl-slave-01-int::svol1    gl-slave-05-int    Active     History Crawl    2020-12-29 23:00:48
gl-master-01-int    mvol1         /brick2/mvol1    root gl-slave-01-int::svol1    gl-slave-03-int    Active     History Crawl    2020-12-29 23:05:45
gl-master-05-int    mvol1         /brick1/mvol1    root gl-slave-01-int::svol1    gl-slave-03-int    Active     History Crawl    2021-02-20 17:38:38
gl-master-06-int    mvol1         /brick1/mvol1    root gl-slave-01-int::svol1    gl-slave-06-int    Passive    N/A              N/A
gl-master-03-int    mvol1         /brick1/mvol1    root gl-slave-01-int::svol1    gl-slave-05-int    Passive    N/A              N/A
gl-master-03-int    mvol1         /brick2/mvol1    root gl-slave-01-int::svol1    gl-slave-04-int    Active     History Crawl    2020-12-29 23:07:34
gl-master-04-int    mvol1         /brick1/mvol1    root gl-slave-01-int::svol1    gl-slave-06-int    Active     History Crawl    2020-12-29 23:07:22
gl-master-04-int    mvol1         /brick2/mvol1    root gl-slave-01-int::svol1    gl-slave-01-int    Passive    N/A              N/A
gl-master-02-int    mvol1         /brick1/mvol1    root gl-slave-01-int::svol1    gl-slave-01-int    Passive    N/A              N/A
gl-master-02-int    mvol1         /brick2/mvol1    root gl-slave-01-int::svol1    gl-slave-06-int    Passive    N/A              N/A
[ 13:14:47 ] - root@gl-master-01  ~ $


________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux