Hi Alexander, Answers inline below: On Thu, Apr 2, 2020 at 1:08 AM Alexander Iliev <ailiev+gluster@xxxxxxxxx> wrote: > > Hi all, > > I have a running geo-replication session between two clusters and I'm > trying to figure out what is the current progress of the replication and > possibly how much longer it will take. > > It has been running for quite a while now (> 1 month), but the thing is > that both the hardware of the nodes and the link between the two > clusters aren't that great (e.g., the volumes are backed by rotating > disks) and the volume is somewhat sizeable (30-ish TB) and given these > details I'm not really sure how long it is supposed to take normally. > > I have several bricks in the volume (same brick size and physical layout > in both clusters) that are now showing up with a Changelog Crawl status > and with a recent LAST_SYNCED date in the `gluster colume > geo-replication status detail` command output which seems to be the > desired state for all bricks. The rest of the bricks though are in > Hybrid Crawl state and have been in that state forever. > > So I suppose my questions are - how can I tell if the replication > session is somehow broken and if it's not, then is there are way for me > to find out the progress and the ETA of the replication? > Please go through this section[1] which talks about this. In Hybrid crawl at present we do not have any accounting information like how much time it will take to sync data. > In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are > some errors like: > > [2020-03-31 11:48:47.81269] E [syncdutils(worker > /data/gfs/store1/8/brick):822:errlog] Popen: command returned error > cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto > -S /tmp/gsync > d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x > /nonexistent/gsyncd slave <vol> x.x.x.x::<vol> --master-node x.x.x.x > --master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick > <brick_path> --local-node x.x.x.x > 2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout > 120 --slave-log-level INFO --slave-gluster-log-level INFO > --slave-gluster-command-dir /usr/sbin error=1 > [2020-03-31 11:48:47.81617] E [syncdutils(worker > <brick_path>):826:logerr] Popen: ssh> failed with ValueError. > [2020-03-31 11:48:47.390397] I [repce(agent > <brick_path>):97:service_loop] RepceServer: terminating on reaching EOF. > If you are seeing this error at a regular interval then please check your ssh connection, it might have broken. If possible please share full traceback form both master and slave to debug the issue. > In the brick logs I see stuff like: > > [2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > > I don't know if these are critical, from the rest of the logs it looks > like data is traveling between the clusters. > > Any help will be greatly appreciated. Thank you in advance! > > Best regards, > -- > alexander iliev > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > https://lists.gluster.org/mailman/listinfo/gluster-users > [1]. https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#status /sunny ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users