Re: Geo-Replication Issue while upgrading

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Deepu.

I will investigate this can you just summarize the steps which can be
helpful in reproducing this issue.

/sunny

On Fri, Nov 29, 2019 at 7:29 AM deepu srinivasan <sdeepugd@xxxxxxxxx> wrote:
>
> Hi Sunny
> The issue seems to be a bug.
> The issue got fixed when I restarted the glusterd daemon in the slave machines. The logs in the slave end reported that the mount-broker folder was not in the vol file. So when I restarted the machine it got fixed.
> This might be some race condition.
>
> On Thu, Nov 28, 2019 at 9:00 PM deepu srinivasan <sdeepugd@xxxxxxxxx> wrote:
>>
>> Hi Sunny
>> I Also got this error in slave end
>>>
>>> [2019-11-28 15:30:12.520461] I [resource(slave 192.168.185.89/home/sas/gluster/data/code-misc):1105:connect] GLUSTER: Mounting gluster volume locally...
>>>
>>> [2019-11-28 15:30:12.649425] E [resource(slave 192.168.185.89/home/sas/gluster/data/code-misc):1013:handle_mounter] MountbrokerMounter: glusterd answered       mnt=
>>>
>>> [2019-11-28 15:30:12.650573] E [syncdutils(slave 192.168.185.89/home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error      cmd=/usr/sbin/gluster --remote-host=localhost system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.118_code-misc/mnt-192.168.185.89-home-sas-gluster-data-code-misc.log volfile-server=localhost volfile-id=code-misc client-pid=-1  error=1
>>>
>>> [2019-11-28 15:30:12.650742] E [syncdutils(slave 192.168.185.89/home/sas/gluster/data/code-misc):809:logerr] Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or directory)
>>
>>
>> On Thu, Nov 28, 2019 at 6:45 PM deepu srinivasan <sdeepugd@xxxxxxxxx> wrote:
>>>
>>> root@192.168.185.101/var/log/glusterfs#ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 sas@192.168.185.118 "sudo gluster volume status"
>>>
>>> **************************************************************************************************************************
>>>
>>> WARNING: This system is a restricted access system.  All activity on this system is subject to monitoring.  If information collected reveals possible criminal activity or activity that exceeds privileges, evidence of such activity may be providedto the relevant authorities for further action.
>>>
>>> By continuing past this point, you expressly consent to   this monitoring
>>>
>>> **************************************************************************************************************************
>>>
>>> invoking sudo in restricted SSH session is not allowed
>>>
>>>
>>> On Thu, Nov 28, 2019 at 6:04 PM Sunny Kumar <sunkumar@xxxxxxxxxx> wrote:
>>>>
>>>> Hi Deepu,
>>>>
>>>> Can you try this:
>>>>
>>>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>>>> /var/lib/glusterd/geo-replication/secret.pem -p 22
>>>> sas@192.168.185.118 "sudo gluster volume status"
>>>>
>>>> /sunny
>>>>
>>>>
>>>> On Thu, Nov 28, 2019 at 12:14 PM deepu srinivasan <sdeepugd@xxxxxxxxx> wrote:
>>>> >>
>>>> >> MASTER NODE        MASTER VOL    MASTER BRICK                        SLAVE USER    SLAVE                             SLAVE NODE         STATUS     CRAWL STATUS    LAST_SYNCED
>>>> >>
>>>> >> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>> >>
>>>> >> 192.168.185.89     code-misc     /home/sas/gluster/data/code-misc    sas           sas@192.168.185.118::code-misc    N/A                Faulty     N/A             N/A
>>>> >>
>>>> >> 192.168.185.101    code-misc     /home/sas/gluster/data/code-misc    sas           sas@192.168.185.118::code-misc    192.168.185.118    Passive    N/A             N/A
>>>> >>
>>>> >> 192.168.185.93     code-misc     /home/sas/gluster/data/code-misc    sas           sas@192.168.185.118::code-misc    N/A                Faulty     N/A             N/A
>>>> >
>>>> >
>>>> > On Thu, Nov 28, 2019 at 5:43 PM deepu srinivasan <sdeepugd@xxxxxxxxx> wrote:
>>>> >>
>>>> >> I Think its configured properly. Should i check something else..
>>>> >>
>>>> >> root@192.168.185.89/var/log/glusterfs#ssh sas@192.168.185.118 "sudo gluster volume info"
>>>> >>
>>>> >> **************************************************************************************************************************
>>>> >>
>>>> >> WARNING: This system is a restricted access system.  All activity on this system is subject to monitoring.  If information collected reveals possible criminal activity or activity that exceeds privileges, evidence of such activity may be providedto the relevant authorities for further action.
>>>> >>
>>>> >> By continuing past this point, you expressly consent to   this monitoring.-
>>>> >>
>>>> >> **************************************************************************************************************************
>>>> >>
>>>> >>
>>>> >>
>>>> >> Volume Name: code-misc
>>>> >>
>>>> >> Type: Replicate
>>>> >>
>>>> >> Volume ID: e9b6fbed-fcd0-42a9-ab11-02ec39c2ee07
>>>> >>
>>>> >> Status: Started
>>>> >>
>>>> >> Snapshot Count: 0
>>>> >>
>>>> >> Number of Bricks: 1 x 3 = 3
>>>> >>
>>>> >> Transport-type: tcp
>>>> >>
>>>> >> Bricks:
>>>> >>
>>>> >> Brick1: 192.168.185.118:/home/sas/gluster/data/code-misc
>>>> >>
>>>> >> Brick2: 192.168.185.45:/home/sas/gluster/data/code-misc
>>>> >>
>>>> >> Brick3: 192.168.185.84:/home/sas/gluster/data/code-misc
>>>> >>
>>>> >> Options Reconfigured:
>>>> >>
>>>> >> features.read-only: enable
>>>> >>
>>>> >> transport.address-family: inet
>>>> >>
>>>> >> nfs.disable: on
>>>> >>
>>>> >> performance.client-io-threads: off
>>>> >>
>>>> >>
>>>> >> On Thu, Nov 28, 2019 at 5:40 PM Sunny Kumar <sunkumar@xxxxxxxxxx> wrote:
>>>> >>>
>>>> >>> Hi Deepu,
>>>> >>>
>>>> >>> Looks like this is error generated due to ssh restrictions:
>>>> >>> Can you please check and confirm ssh is properly configured?
>>>> >>>
>>>> >>>
>>>> >>> 2019-11-28 11:59:12.934436] E [syncdutils(worker
>>>> >>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh>
>>>> >>> **************************************************************************************************************************
>>>> >>>
>>>> >>> [2019-11-28 11:59:12.934703] E [syncdutils(worker
>>>> >>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING:
>>>> >>> This system is a restricted access system.  All activity on this
>>>> >>> system is subject to monitoring.  If information collected reveals
>>>> >>> possible criminal activity or activity that exceeds privileges,
>>>> >>> evidence of such activity may be providedto the relevant authorities
>>>> >>> for further action.
>>>> >>>
>>>> >>> [2019-11-28 11:59:12.934967] E [syncdutils(worker
>>>> >>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By
>>>> >>> continuing past this point, you expressly consent to   this
>>>> >>> monitoring.- ZOHO Corporation
>>>> >>>
>>>> >>> [2019-11-28 11:59:12.935194] E [syncdutils(worker
>>>> >>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh>
>>>> >>> **************************************************************************************************************************
>>>> >>>
>>>> >>> 2019-11-28 11:59:12.944369] I [repce(agent
>>>> >>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer:
>>>> >>> terminating on reaching EOF.
>>>> >>>
>>>> >>> /sunny
>>>> >>>
>>>> >>> On Thu, Nov 28, 2019 at 12:03 PM deepu srinivasan <sdeepugd@xxxxxxxxx> wrote:
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> > ---------- Forwarded message ---------
>>>> >>> > From: deepu srinivasan <sdeepugd@xxxxxxxxx>
>>>> >>> > Date: Thu, Nov 28, 2019 at 5:32 PM
>>>> >>> > Subject: Geo-Replication Issue while upgrading
>>>> >>> > To: gluster-users <gluster-users@xxxxxxxxxxx>
>>>> >>> >
>>>> >>> >
>>>> >>> > Hi Users/Developers
>>>> >>> > I hope you remember the last issue we faced regarding the geo-replication goes to the faulty state while stopping and starting the geo-replication.
>>>> >>> >>
>>>> >>> >> [2019-11-16 17:29:43.536881] I [gsyncdstatus(worker /home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: Worker Status Change       status=Active
>>>> >>> >> [2019-11-16 17:29:43.629620] I [gsyncdstatus(worker /home/sas/gluster/data/code-misc6):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change   status=History Crawl
>>>> >>> >> [2019-11-16 17:29:43.630328] I [master(worker /home/sas/gluster/data/code-misc6):1517:crawl] _GMaster: starting history crawl   turns=1 stime=(1573924576, 0)   entry_stime=(1573924576, 0)     etime=1573925383
>>>> >>> >> [2019-11-16 17:29:44.636725] I [master(worker /home/sas/gluster/data/code-misc6):1546:crawl] _GMaster: slave's time     stime=(1573924576, 0)
>>>> >>> >> [2019-11-16 17:29:44.778966] I [master(worker /home/sas/gluster/data/code-misc6):898:fix_possible_entry_failures] _GMaster: Fixing ENOENT error in slave. Parent does not exist on master. Safe to ignore, take out entry       retry_count=1   entry=({'uid': 0, 'gfid': 'c02519e0-0ead-4fe8-902b-dcae72ef83a3', 'gid': 0, 'mode': 33188, 'entry': '.gfid/d60aa0d5-4fdf-4721-97dc-9e3e50995dab/368307802', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
>>>> >>> >> [2019-11-16 17:29:44.779306] I [master(worker /home/sas/gluster/data/code-misc6):942:handle_entry_failures] _GMaster: Sucessfully fixed entry ops with gfid mismatch    retry_count=1
>>>> >>> >> [2019-11-16 17:29:44.779516] I [master(worker /home/sas/gluster/data/code-misc6):1194:process_change] _GMaster: Retry original entries. count = 1
>>>> >>> >> [2019-11-16 17:29:44.879321] E [repce(worker /home/sas/gluster/data/code-misc6):214:__call__] RepceClient: call failed  call=151945:140353273153344:1573925384.78       method=entry_ops        error=OSError
>>>> >>> >> [2019-11-16 17:29:44.879750] E [syncdutils(worker /home/sas/gluster/data/code-misc6):338:log_raise_exception] <top>: FAIL:
>>>> >>> >> Traceback (most recent call last):
>>>> >>> >>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 322, in main
>>>> >>> >>     func(args)
>>>> >>> >>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 82, in subcmd_worker
>>>> >>> >>     local.service_loop(remote)
>>>> >>> >>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1277, in service_loop
>>>> >>> >>     g3.crawlwrap(oneshot=True)
>>>> >>> >>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 599, in crawlwrap
>>>> >>> >>     self.crawl()
>>>> >>> >>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1555, in crawl
>>>> >>> >>     self.changelogs_batch_process(changes)
>>>> >>> >>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1455, in changelogs_batch_process
>>>> >>> >>     self.process(batch)
>>>> >>> >>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1290, in process
>>>> >>> >>     self.process_change(change, done, retry)
>>>> >>> >>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1195, in process_change
>>>> >>> >>     failures = self.slave.server.entry_ops(entries)
>>>> >>> >>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 233, in __call__
>>>> >>> >>     return self.ins(self.meth, *a)
>>>> >>> >>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 215, in __call__
>>>> >>> >>     raise res
>>>> >>> >> OSError: [Errno 13] Permission denied: '/home/sas/gluster/data/code-misc6/.glusterfs/6a/90/6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb'
>>>> >>> >> [2019-11-16 17:29:44.911767] I [repce(agent /home/sas/gluster/data/code-misc6):97:service_loop] RepceServer: terminating on reaching EOF.
>>>> >>> >> [2019-11-16 17:29:45.509344] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase     brick=/home/sas/gluster/data/code-misc6
>>>> >>> >> [2019-11-16 17:29:45.511806] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> > Now after upgrading to 7.0 version from 5.6 we got an error in geo-replication.
>>>> >>> > Scenario:
>>>> >>> >
>>>> >>> > We had a 1x3 replication and distributed volume in each DC.
>>>> >>> > Both volumes are started and the geo-replication session is set up between them and the files are synched. Now the geo-replication session is deleted.
>>>> >>> > Started to upgrade to 7.0 for each server starting from the slave end. I followed this link --> https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/
>>>> >>> > After starting the glusterd process created a geo-replication again but ends up in a faulty state. Please find the logs
>>>> >>> >
>>>> >>> >> [2019-11-28 11:59:12.370255] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing...
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.370615] I [monitor(monitor):159:monitor] Monitor: starting gsyncd worker brick=/home/sas/gluster/data/code-misc slave_node=192.168.185.84
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.445581] I [gsyncd(agent /home/sas/gluster/data/code-misc):311:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.118_code-misc/gsyncd.conf
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.448383] I [changelogagent(agent /home/sas/gluster/data/code-misc):72:__init__] ChangelogAgent: Agent listining...
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.453881] I [gsyncd(worker /home/sas/gluster/data/code-misc):311:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.118_code-misc/gsyncd.conf
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.472862] I [resource(worker /home/sas/gluster/data/code-misc):1386:connect_remote] SSH: Initializing SSH connection between master and slave...
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.933346] E [syncdutils(worker /home/sas/gluster/data/code-misc):311:log_raise_exception] <top>: connection to peer is broken
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.934117] E [syncdutils(worker /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-tKcFQe/5697733f424862ab9d57e019de78aca6.sock sas@192.168.185.84 /usr/libexec/glusterfs/gsyncd slave code-misc sas@192.168.185.118::code-misc --master-node 192.168.185.89 --master-node-id a7a9688e-700c-4452-9cd6-e10d6eed5335 --master-brick /home/sas/gluster/data/code-misc --local-node 192.168.185.84 --local-node-id cbafeca3-650b-4c9e-8ea6-2451ea9265dd --slave-timeout 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin --master-dist-count 3 error=1
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.934436] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> **************************************************************************************************************************
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.934703] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING: This system is a restricted access system.  All activity on this system is subject to monitoring.  If information collected reveals possible criminal activity or activity that exceeds privileges, evidence of such activity may be providedto the relevant authorities for further action.
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.934967] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By continuing past this point, you expressly consent to   this monitoring.- ZOHO Corporation
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.935194] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> **************************************************************************************************************************
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.944369] I [repce(agent /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating on reaching EOF.
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.944722] I [monitor(monitor):280:monitor] Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc
>>>> >>> >>
>>>> >>> >> [2019-11-28 11:59:12.947575] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
>>>> >>> >
>>>> >>> >
>>>> >>>
>>>>

________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux