Hi There,
We have a Gluster setup with three master nodes in replicated mode and one slave node
with geo-replication.
# gluster volume info
Volume Name: tier1data
Type: Replicate
Volume ID: 93c45c14-f700-4d50-962b-7653be471e27
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: master1:/opt/tier1data2019/brick
Brick2: master2:/opt/tier1data2019/brick
Brick3: master3:/opt/tier1data2019/brick
master1 |
master2 | ------------------------------geo-replication----------------------------- | drtier1data master3 | We added the master3 node a few months back, the initial setup consisted of 2 master
nodes and one geo-replicated slave(drtier1data).
Our geo-replication was functioning well with the initial two master nodes (master1
and master2), where master1 was active and master2 was in passive mode. However, today, we started experiencing issues where geo-replication suddenly stopped and became stuck in a loop of Initializing..., Active.. Faulty on master1, while master2 remained
in passive mode.
Upon checking the gsyncd.log on the master1
node, we observed the following error (please refer to the attached logs for more details):
# gluster volume geo-replication tier1data status MASTER NODE
MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master1 tier1data
/opt/tier1data2019/brick root ssh://drtier1data::drtier1data N/A Faulty N/A N/A
master2 tier1data
/opt/tier1data2019/brick root ssh://drtier1data::drtier1data Passive N/A N/A
Suspecting an issue on the drtier1data(slave), I attempted to restart Gluster on the
slave node, also tried to restart drtier1data server
without any luck.
After that I tried the following command to get the Primary-log-file for geo-replication
on master1, and got the following error.
# gluster volume geo-replication tier1data drtier1data::drtier1data config log-file
Staging failed on master3. Error: Geo-replication session between tier1data and drtier1data::drtier1data
does not exist.
geo-replication command failed
Master3 was the new node added a few months
back, but geo-replication was working until today, and we never added this node under geo-replication. After that, I forcefully stopped the geo-replication, thinking that restarting geo-replication might fix the
issue. However, now the geo-replication is not starting and is giving the same error. # gluster volume geo-replication tier1data drtier1data::drtier1data start force
Staging failed on master3. Error: Geo-replication session between tier1data and drtier1data::drtier1data
does not exist.
geo-replication command failed
Can anyone please suggest what I should do next to resolve this issue? As there is 5TB of data in this volume,
I don't want to resync the entire data to drtier1data. Instead, I want to resume the sync from where it last stopped.
Thanks in advance for any guidance/help.
Kind regards,
Anant
DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please
notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if
you have received this email by mistake and delete this email from your system. |
Attachment:
commands
Description: commands
[2024-01-22 15:06:04.581935] I [master(worker /opt/tier1data2019/brick):1525:crawl] _GMaster: slave's time [{stime=(1705935946, 0)}] [2024-01-22 15:06:11.937102] I [master(worker /opt/tier1data2019/brick):2013:syncjob] Syncer: Sync Time Taken [{job=2}, {num_files=23}, {return_code=0}, {duration=4.5536}] [2024-01-22 15:06:13.259436] I [master(worker /opt/tier1data2019/brick):1439:process] _GMaster: Entry Time Taken [{UNL=4}, {RMD=0}, {CRE=27}, {MKN=0}, {MKD=0}, {REN=0}, {LIN=0}, {SYM=0}, {duration=1.5474}] [2024-01-22 15:06:13.259582] I [master(worker /opt/tier1data2019/brick):1449:process] _GMaster: Data/Metadata Time Taken [{SETA=0}, {meta_duration=0.0000}, {SETX=0}, {XATT=0}, {DATA=23}, {data_duration=5.8782}] [2024-01-22 15:06:13.259822] I [master(worker /opt/tier1data2019/brick):1459:process] _GMaster: Batch Completed [{mode=live_changelog}, {duration=7.4315}, {changelog_start=1705935962}, {changelog_end=1705935962}, {num_changelogs=1}, {stime=(1705935961, 0)}, {entry_stime=(1705935961, 0)}] [2024-01-22 15:06:18.268149] I [master(worker /opt/tier1data2019/brick):1525:crawl] _GMaster: slave's time [{stime=(1705935961, 0)}] [2024-01-22 15:06:24.17919] I [master(worker /opt/tier1data2019/brick):2013:syncjob] Syncer: Sync Time Taken [{job=5}, {num_files=19}, {return_code=24}, {duration=2.1243}] [2024-01-22 15:06:24.29803] W [master(worker /opt/tier1data2019/brick):1411:process] <top>: incomplete sync, retrying changelogs [{files=['CHANGELOG.1705935977']}] [2024-01-22 15:06:25.241017] I [master(worker /opt/tier1data2019/brick):2013:syncjob] Syncer: Sync Time Taken [{job=1}, {num_files=18}, {return_code=0}, {duration=0.3331}] [2024-01-22 15:06:26.505673] I [master(worker /opt/tier1data2019/brick):1439:process] _GMaster: Entry Time Taken [{UNL=0}, {RMD=0}, {CRE=0}, {MKN=0}, {MKD=0}, {REN=0}, {LIN=0}, {SYM=0}, {duration=0.0000}] [2024-01-22 15:06:26.505828] I [master(worker /opt/tier1data2019/brick):1449:process] _GMaster: Data/Metadata Time Taken [{SETA=0}, {meta_duration=0.0000}, {SETX=0}, {XATT=0}, {DATA=0}, {data_duration=1705935986.5058}] [2024-01-22 15:06:26.506065] I [master(worker /opt/tier1data2019/brick):1459:process] _GMaster: Batch Completed [{mode=live_changelog}, {duration=6.6757}, {changelog_start=1705935977}, {changelog_end=1705935977}, {num_changelogs=1}, {stime=(1705935976, 0)}, {entry_stime=(1705935976, 0)}] [2024-01-22 15:06:36.522529] I [master(worker /opt/tier1data2019/brick):1525:crawl] _GMaster: slave's time [{stime=(1705935976, 0)}] [2024-01-22 15:06:43.504798] I [master(worker /opt/tier1data2019/brick):2013:syncjob] Syncer: Sync Time Taken [{job=1}, {num_files=10}, {return_code=0}, {duration=4.7454}] [2024-01-22 15:06:44.966866] I [master(worker /opt/tier1data2019/brick):1439:process] _GMaster: Entry Time Taken [{UNL=7}, {RMD=0}, {CRE=16}, {MKN=0}, {MKD=0}, {REN=0}, {LIN=0}, {SYM=0}, {duration=0.8898}] [2024-01-22 15:06:44.967015] I [master(worker /opt/tier1data2019/brick):1449:process] _GMaster: Data/Metadata Time Taken [{SETA=0}, {meta_duration=0.0000}, {SETX=0}, {XATT=0}, {DATA=10}, {data_duration=6.2818}] [2024-01-22 15:06:44.967248] I [master(worker /opt/tier1data2019/brick):1459:process] _GMaster: Batch Completed [{mode=live_changelog}, {duration=7.1792}, {changelog_start=1705935992}, {changelog_end=1705935992}, {num_changelogs=1}, {stime=(1705935991, 0)}, {entry_stime=(1705935991, 0)}] [2024-01-22 15:06:49.975883] I [master(worker /opt/tier1data2019/brick):1525:crawl] _GMaster: slave's time [{stime=(1705935991, 0)}] [2024-01-22 15:06:51.254171] E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount process exited [{error=ENOTCONN}] [2024-01-22 15:06:54.550737] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}] [2024-01-22 15:07:04.579144] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}] [2024-01-22 15:07:04.579312] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}] [2024-01-22 15:07:04.662445] I [resource(worker /opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection between master and slave... [2024-01-22 15:07:06.531760] I [resource(worker /opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.8691}] [2024-01-22 15:07:06.532051] I [resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume locally... [2024-01-22 15:07:07.569779] I [resource(worker /opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.0376}] [2024-01-22 15:07:07.569952] I [subcmds(worker /opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor [2024-01-22 15:07:09.579331] I [master(worker /opt/tier1data2019/brick):1662:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}] [2024-01-22 15:07:09.579695] I [resource(worker /opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time [{time=1705936029}] [2024-01-22 15:07:09.587607] I [gsyncdstatus(worker /opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change [{status=Active}] [2024-01-22 15:07:09.588185] I [gsyncdstatus(worker /opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change [{status=History Crawl}] [2024-01-22 15:07:09.588346] I [master(worker /opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl [{turns=1}, {stime=(1705935991, 0)}, {etime=1705936029}, {entry_stime=(1705935991, 0)}] [2024-01-22 15:07:12.163526] I [master(worker /opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time [{stime=(1705935991, 0)}] [2024-01-22 15:07:12.287395] E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount process exited [{error=ENOTCONN}] [2024-01-22 15:07:12.575075] I [monitor(monitor):228:monitor] Monitor: worker died in startup phase [{brick=/opt/tier1data2019/brick}] [2024-01-22 15:07:12.576644] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}] [2024-01-22 15:07:22.602555] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}] [2024-01-22 15:07:22.602743] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}] [2024-01-22 15:07:22.687116] I [resource(worker /opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection between master and slave... [2024-01-22 15:07:24.523975] I [resource(worker /opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.8366}] [2024-01-22 15:07:24.524267] I [resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume locally... [2024-01-22 15:07:25.561446] I [resource(worker /opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.0371}] [2024-01-22 15:07:25.561643] I [subcmds(worker /opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor [2024-01-22 15:07:27.571655] I [master(worker /opt/tier1data2019/brick):1662:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}] [2024-01-22 15:07:27.572050] I [resource(worker /opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time [{time=1705936047}] [2024-01-22 15:07:27.579249] I [gsyncdstatus(worker /opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change [{status=Active}] [2024-01-22 15:07:27.579776] I [gsyncdstatus(worker /opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change [{status=History Crawl}] [2024-01-22 15:07:27.579936] I [master(worker /opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl [{turns=1}, {stime=(1705935991, 0)}, {etime=1705936047}, {entry_stime=(1705935991, 0)}] [2024-01-22 15:07:28.580880] I [master(worker /opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time [{stime=(1705935991, 0)}] [2024-01-22 15:07:28.703109] E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount process exited [{error=ENOTCONN}]
[2024-01-22 15:07:07.576515 +0000] I [MSGID: 132028] [gf-changelog.c:577:gf_changelog_register_generic] 0-gfchangelog: Registering brick [{brick=/opt/tier1data2019/brick}, {notify_filter=1}] [2024-01-22 15:07:07.578354 +0000] I [socket.c:929:__socket_server_bind] 0-socket.gfchangelog: closing (AF_UNIX) reuse check socket 17 [2024-01-22 15:07:09.588416 +0000] I [MSGID: 132035] [gf-history-changelog.c:841:gf_history_changelog] 0-gfchangelog: Requesting historical changelogs [{start=1705935991}, {end=1705936029}] [2024-01-22 15:07:09.588498 +0000] I [MSGID: 132019] [gf-history-changelog.c:759:gf_changelog_extract_min_max] 0-gfchangelog: changelogs min max [{min=1643536439}, {max=1705936022}, {total_changelogs=4516556}] [2024-01-22 15:07:11.162539 +0000] I [MSGID: 132036] [gf-history-changelog.c:959:gf_history_changelog] 0-gfchangelog: FINAL [{from=1705935992}, {to=1705936022}, {changes=3}] [2024-01-22 15:07:25.568930 +0000] I [MSGID: 132028] [gf-changelog.c:577:gf_changelog_register_generic] 0-gfchangelog: Registering brick [{brick=/opt/tier1data2019/brick}, {notify_filter=1}] [2024-01-22 15:07:25.570835 +0000] I [socket.c:929:__socket_server_bind] 0-socket.gfchangelog: closing (AF_UNIX) reuse check socket 17 [2024-01-22 15:07:27.580006 +0000] I [MSGID: 132035] [gf-history-changelog.c:841:gf_history_changelog] 0-gfchangelog: Requesting historical changelogs [{start=1705935991}, {end=1705936047}] [2024-01-22 15:07:27.580077 +0000] I [MSGID: 132019] [gf-history-changelog.c:759:gf_changelog_extract_min_max] 0-gfchangelog: changelogs min max [{min=1643536439}, {max=1705936037}, {total_changelogs=4516557}] [2024-01-22 15:07:27.580174 +0000] I [MSGID: 132036] [gf-history-changelog.c:959:gf_history_changelog] 0-gfchangelog: FINAL [{from=1705935992}, {to=1705936037}, {changes=4}] [2024-01-22 15:07:42.564182 +0000] I [MSGID: 132028] [gf-changelog.c:577:gf_changelog_register_generic] 0-gfchangelog: Registering brick [{brick=/opt/tier1data2019/brick}, {notify_filter=1}] [2024-01-22 15:07:42.566029 +0000] I [socket.c:929:__socket_server_bind] 0-socket.gfchangelog: closing (AF_UNIX) reuse check socket 17 [2024-01-22 15:07:44.574423 +0000] I [MSGID: 132035] [gf-history-changelog.c:841:gf_history_changelog] 0-gfchangelog: Requesting historical changelogs [{start=1705935991}, {end=1705936064}] [2024-01-22 15:07:44.574504 +0000] I [MSGID: 132019] [gf-history-changelog.c:759:gf_changelog_extract_min_max] 0-gfchangelog: changelogs min max [{min=1643536439}, {max=1705936052}, {total_changelogs=4516558}] [2024-01-22 15:07:44.574604 +0000] I [MSGID: 132036] [gf-history-changelog.c:959:gf_history_changelog] 0-gfchangelog: FINAL [{from=1705935992}, {to=1705936052}, {changes=5}] [2024-01-22 15:07:59.824057 +0000] I [MSGID: 132028] [gf-changelog.c:577:gf_changelog_register_generic] 0-gfchangelog: Registering brick [{brick=/opt/tier1data2019/brick}, {notify_filter=1}] [2024-01-22 15:07:59.826176 +0000] I [socket.c:929:__socket_server_bind] 0-socket.gfchangelog: closing (AF_UNIX) reuse check socket 17
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users