Hi to all,Yes, I saw that there is a thread about geo-replication with nearly the same problem, I read it, but I think my problem is a bit different.
I created two volumes the primary volume "privol01" and the secondary volume "secvol01". All hosts are having the same packages installed, all hosts are debian12 with gluster version 10.05. So even rsync is the same on any of the hosts. (I installed one host (vm) and clone it).
I have: Volume Name: privol01 Type: Replicate Volume ID: 93ace064-2862-41fe-9606-af5a4af9f5ab Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: p01:/gluster/brick Brick2: p02:/gluster/brick Brick3: p03:/gluster/brick and: Volume Name: secvol01 Type: Replicate Volume ID: 4ebb7768-51da-446c-a301-dc3ea49a9ba2 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: s01:/gluster/brick Brick2: s02:/gluster/brick Brick3: s03:/gluster/brick resolving the names of the hosts is working in any direction that's what I did: on all secondary hosts: groupadd geogruppe useradd -G geogruppe -m geobenutzer passwd geobenutzer ln -s /usr/sbin/gluster /usr/bin on one of the secondary hosts: gluster-mountbroker setup /var/mountbroker geogruppe gluster-mountbroker add secvol01 geobenutzer on one of the primary hosts: ssh-keygen ssh-copy-id geobenutzer@s01.gluster gluster-georep-sshkey generategluster v geo-replication privol01 geobenutzer@s01.gluster::secvol01 create push-pem
on one of the secondary hosts: /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh All the commands exited with out an error message. Restarted glusterd on all nodes then on the primary host:gluster volume geo-replication privol01 geobenutzer@s01.gluster::secvol01 start
The status is showing:PRIMARY NODE PRIMARY VOL PRIMARY BRICK SECONDARY USER SECONDARY SECONDARY NODE STATUS CRAWL STATUS LAST_SYNCED
---------------------------------------------------------------------------------------------------------------------------------------------------------------p03 privol01 /gluster/brick geobenutzer geobenutzer@s01.gluster::secvol01 Passive N/A N/A p02 privol01 /gluster/brick geobenutzer geobenutzer@s01.gluster::secvol01 Passive N/A N/A p01 privol01 /gluster/brick geobenutzer geobenutzer@s01.gluster::secvol01 N/A Faulty N/A N/A
For p01 the status is changing from "Initializing... to" "status=Active status=History Crawl" to status=Faulty and then back to Initializing
But only for the primary host p01. Here is the lock from p01: --------------------------------[2024-02-13 18:30:06.64585] I [gsyncdstatus(monitor):247:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}] [2024-02-13 18:30:06.65004] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker [{brick=/gluster/brick}, {secondary_node=s01}] [2024-02-13 18:30:06.147194] I [resource(worker /gluster/brick):1387:connect_remote] SSH: Initializing SSH connection between primary and secondary... [2024-02-13 18:30:07.777785] I [resource(worker /gluster/brick):1435:connect_remote] SSH: SSH connection between primary and secondary established. [{duration=1.6304}] [2024-02-13 18:30:07.777971] I [resource(worker /gluster/brick):1116:connect] GLUSTER: Mounting gluster volume locally... [2024-02-13 18:30:08.822077] I [resource(worker /gluster/brick):1138:connect] GLUSTER: Mounted gluster volume [{duration=1.0438}] [2024-02-13 18:30:08.823039] I [subcmds(worker /gluster/brick):84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor [2024-02-13 18:30:10.861742] I [primary(worker /gluster/brick):1661:register] _GPrimary: Working dir [{path=/var/lib/misc/gluster/gsyncd/privol01_s01.gluster_secvol01/gluster-brick}] [2024-02-13 18:30:10.864432] I [resource(worker /gluster/brick):1291:service_loop] GLUSTER: Register time [{time=1707849010}] [2024-02-13 18:30:10.906805] I [gsyncdstatus(worker /gluster/brick):280:set_active] GeorepStatus: Worker Status Change [{status=Active}] [2024-02-13 18:30:11.7656] I [gsyncdstatus(worker /gluster/brick):252:set_worker_crawl_status] GeorepStatus: Crawl Status Change [{status=History Crawl}] [2024-02-13 18:30:11.7984] I [primary(worker /gluster/brick):1572:crawl] _GPrimary: starting history crawl [{turns=1}, {stime=(1707848760, 0)}, {etime=1707849011}, {entry_stime=None}] [2024-02-13 18:30:12.9234] I [primary(worker /gluster/brick):1604:crawl] _GPrimary: secondary's time [{stime=(1707848760, 0)}] [2024-02-13 18:30:12.388528] I [primary(worker /gluster/brick):2009:syncjob] Syncer: Sync Time Taken [{job=2}, {num_files=2}, {return_code=12}, {duration=0.0520}] [2024-02-13 18:30:12.388745] E [syncdutils(worker /gluster/brick):845:errlog] Popen: command returned error [{cmd=rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --existing --xattrs --acls --ignore-missing-args . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-1_kow1tp/c343d8e67535166a0d66b71865f3f3c4.sock geobenutzer@s01:/proc/2675/cwd}, {error=12}] [2024-02-13 18:30:12.826546] I [monitor(monitor):227:monitor] Monitor: worker died in startup phase [{brick=/gluster/brick}] [2024-02-13 18:30:12.845687] I [gsyncdstatus(monitor):247:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}]
--------------------- The host p01 is trying to connect to s01 A look at host p02 of the primary volume is showing: -------------------[2024-02-13 18:25:55.179385] I [gsyncdstatus(monitor):247:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}] [2024-02-13 18:25:55.179572] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker [{brick=/gluster/brick}, {secondary_node=s01}] [2024-02-13 18:25:55.258658] I [resource(worker /gluster/brick):1387:connect_remote] SSH: Initializing SSH connection between primary and secondary... [2024-02-13 18:25:57.78159] I [resource(worker /gluster/brick):1435:connect_remote] SSH: SSH connection between primary and secondary established. [{duration=1.8194}] [2024-02-13 18:25:57.78254] I [resource(worker /gluster/brick):1116:connect] GLUSTER: Mounting gluster volume locally... [2024-02-13 18:25:58.123291] I [resource(worker /gluster/brick):1138:connect] GLUSTER: Mounted gluster volume [{duration=1.0450}] [2024-02-13 18:25:58.123410] I [subcmds(worker /gluster/brick):84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor [2024-02-13 18:26:00.135934] I [primary(worker /gluster/brick):1661:register] _GPrimary: Working dir [{path=/var/lib/misc/gluster/gsyncd/privol01_s01.gluster_secvol01/gluster-brick}] [2024-02-13 18:26:00.136287] I [resource(worker /gluster/brick):1291:service_loop] GLUSTER: Register time [{time=1707848760}] [2024-02-13 18:26:00.179157] I [gsyncdstatus(worker /gluster/brick):286:set_passive] GeorepStatus: Worker Status Change [{status=Passive}]
------------------ This is primary node is also connecting to s01 and it works.It must have something to do with the primary host, because if I stop the replication and restart it, the primary host is triying to connect to a different secondary host with the same error:
----------------Popen: command returned error [{cmd=rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --existing --xattrs --acls --ignore-missing-args . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-1_kow1tp/c343d8e67535166a0d66b71865f3f3c4.sock geobenutzer@s01:/proc/2675/cwd}, {error=12}]
----------------So the problem must be the primary host p01. That's the host I configured the passwordless ssh-session.
This is is test-setup I also tried it before with two other volumes with 6 Nodes each. There I had 2 faulty nodes in the primary volume.
I can start and stop the replication session from any of the primary nodes but always p01 is faulty.
Any help ? Stefan
Attachment:
smime.p7s
Description: Kryptografische S/MIME-Signatur
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users