Re: Geo-rep failing initial sync

Saravanakumar Arumugam <sarumuga@xxxxxxxxxx> · Mon, 19 Oct 2015 14:37:14 +0530



    Hi Wade,

    
    There seems to be some issue in syncing the existing data in the
    volume using Xsync crawl.

    ( To give some background: When geo-rep is started it goes to
    filesystem crawl(Xsync) and sync all the data to slave, and then the
    session switches to CHANGELOG mode).

    
    We are looking in to this.

    
    Any specific reason to go for Stripe volume?  This seems to be not
    extensively tested with geo-rep.

    
    Thanks,

    Saravana

    
    On 10/19/2015 08:24 AM, Wade
      Fitzpatrick wrote:

    
      The relevant portions of the log appear to be as follows.
      Everything seemed fairly normal (though quite slow) until 

      
      [2015-10-08 15:31:26.471216] I
        [master(/data/gluster1/static/brick1):1249:crawl] _GMaster:
        finished hybrid crawl syncing, stime: (1444278018, 482251)

        [2015-10-08 15:31:34.39248] I
        [syncdutils(/data/gluster1/static/brick1):220:finalize]
        <top>: exiting.

        [2015-10-08 15:31:34.40934] I [repce(agent):92:service_loop]
        RepceServer: terminating on reaching EOF.

        [2015-10-08 15:31:34.41220] I [syncdutils(agent):220:finalize]
        <top>: exiting.

        [2015-10-08 15:31:35.615353] I [monitor(monitor):362:distribute]
        <top>: slave bricks: [{'host': 'palace', 'dir':
        '/data/gluster1/static/brick1'}, {'host': 'madonna', 'dir'

        : '/data/gluster1/static/brick2'}]

        [2015-10-08 15:31:35.616558] I [monitor(monitor):383:distribute]
        <top>: worker specs: [('/data/gluster1/static/brick1', 'ssh://root@palace:gluster://localhost:static',
        1)]

        [2015-10-08 15:31:35.748434] I [monitor(monitor):221:monitor]
        Monitor:
        ------------------------------------------------------------

        [2015-10-08 15:31:35.748775] I [monitor(monitor):222:monitor]
        Monitor: starting gsyncd worker

        [2015-10-08 15:31:35.837651] I
        [changelogagent(agent):75:__init__] ChangelogAgent: Agent
        listining...

        [2015-10-08 15:31:35.841150] I
        [gsyncd(/data/gluster1/static/brick1):649:main_i] <top>:
        syncing: gluster://localhost:static -> ssh://root@palace:gluster://localhost:static

        [2015-10-08 15:31:38.543379] I
        [master(/data/gluster1/static/brick1):83:gmaster_builder]
        <top>: setting up xsync change detection mode

        [2015-10-08 15:31:38.543802] I
        [master(/data/gluster1/static/brick1):401:__init__] _GMaster:
        using 'tar over ssh' as the sync engine

        [2015-10-08 15:31:38.544673] I
        [master(/data/gluster1/static/brick1):83:gmaster_builder]
        <top>: setting up xsync change detection mode

        [2015-10-08 15:31:38.544924] I
        [master(/data/gluster1/static/brick1):401:__init__] _GMaster:
        using 'tar over ssh' as the sync engine

        [2015-10-08 15:31:38.546163] I
        [master(/data/gluster1/static/brick1):83:gmaster_builder]
        <top>: setting up xsync change detection mode

        [2015-10-08 15:31:38.546406] I
        [master(/data/gluster1/static/brick1):401:__init__] _GMaster:
        using 'tar over ssh' as the sync engine

        [2015-10-08 15:31:38.548989] I
        [master(/data/gluster1/static/brick1):1220:register] _GMaster:
        xsync temp directory:
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync

        [2015-10-08 15:31:38.549267] I
        [master(/data/gluster1/static/brick1):1220:register] _GMaster:
        xsync temp directory:
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync

        [2015-10-08 15:31:38.549467] I
        [master(/data/gluster1/static/brick1):1220:register] _GMaster:
        xsync temp directory:
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync

        [2015-10-08 15:31:38.549632] I
        [resource(/data/gluster1/static/brick1):1432:service_loop]
        GLUSTER: Register time: 1444278698

        [2015-10-08 15:31:38.582277] I
        [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster:
        primary master with volume id
        3f9f810d-a988-4914-a5ca-5bd7b251a273 ...

        [2015-10-08 15:31:38.584099] I
        [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster:
        crawl interval: 60 seconds

        [2015-10-08 15:31:38.587405] I
        [master(/data/gluster1/static/brick1):1242:crawl] _GMaster:
        starting hybrid crawl..., stime: (1444278018, 482251)

        [2015-10-08 15:31:38.588735] I
        [master(/data/gluster1/static/brick1):1249:crawl] _GMaster:
        finished hybrid crawl syncing, stime: (1444278018, 482251)

        [2015-10-08 15:31:38.590116] I
        [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster:
        primary master with volume id
        3f9f810d-a988-4914-a5ca-5bd7b251a273 ...

        [2015-10-08 15:31:38.591582] I
        [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster:
        crawl interval: 60 seconds

        [2015-10-08 15:31:38.593844] I
        [master(/data/gluster1/static/brick1):1242:crawl] _GMaster:
        starting hybrid crawl..., stime: (1444278018, 482251)

        [2015-10-08 15:31:38.594832] I
        [master(/data/gluster1/static/brick1):1249:crawl] _GMaster:
        finished hybrid crawl syncing, stime: (1444278018, 482251)

        [2015-10-08 15:32:38.641908] I
        [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1
        crawls, 0 turns

        [2015-10-08 15:32:38.644370] I
        [master(/data/gluster1/static/brick1):1242:crawl] _GMaster:
        starting hybrid crawl..., stime: (1444278018, 482251)

        [2015-10-08 15:32:39.646733] I
        [master(/data/gluster1/static/brick1):1252:crawl] _GMaster:
        processing xsync changelog
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync/XSYNC-CHANGELOG.1444278758

        [2015-10-08 15:32:40.857084] W
        [master(/data/gluster1/static/brick1):803:log_failures]
        _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
        'fc446c88-a5b7-468b-ac52-25b4225fe0cf', 'gid': 0, 'mode': 33188,
        'entry':
        '.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-1.html',

        'op': 'MKNOD'}, 17, '02489235-13c5-4232-8d6d-c7843bc5249b')

        [2015-10-08 15:32:40.858580] W
        [master(/data/gluster1/static/brick1):803:log_failures]
        _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
        'e08813c5-055a-4354-94ec-f1b41a14b2a4', 'gid': 0, 'mode': 33188,
        'entry':
        '.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-2.html',

        'op': 'MKNOD'}, 17, '0abae047-5816-4199-8203-fa8b974dfef5')

      
      ...

      
      [2015-10-08 15:33:38.236779] W
        [master(/data/gluster1/static/brick1):803:log_failures]
        _GMaster: ENTRY FAILED: ({'uid': 1000, 'gfid':
        'a41a2ac7-8fec-46bd-a4cc-8d8794e5ee39', 'gid

        ': 1000, 'mode': 33206, 'entry':
        '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/1PYhnxMyMMcQo8ukuyMsqq.png',
        'op': 'MKNOD'}, 17, 'e047db7d-f96c-496f-8a83-5db8e41859ca')

        [2015-10-08 15:33:38.237443] W
        [master(/data/gluster1/static/brick1):803:log_failures]
        _GMaster: ENTRY FAILED: ({'uid': 1000, 'gfid':
        '507f77db-0dc0-4d7f-9eb3-8f56b3e01765', 'gid

        ': 1000, 'mode': 33206, 'entry':
        '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/17H7rpUIXGEQemM0wCoy6c.png',
        'op': 'MKNOD'}, 17, 'ee7fa964-fc92-4008-b38a-e790fbbb1285')

        [2015-10-08 15:33:38.238053] W
        [master(/data/gluster1/static/brick1):803:log_failures]
        _GMaster: ENTRY FAILED: ({'uid': 1000, 'gfid':
        '6c495557-6808-4ff9-98de-39afbbeeac82', 'gid

        ': 1000, 'mode': 33206, 'entry':
        '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/3T3VvUQH44my0Eosiieeok.png',
        'op': 'MKNOD'}, 17, 'cc6a75c4-0817-497e-912b-4442fd19db83')

        [2015-10-08 15:33:43.615427] W
        [master(/data/gluster1/static/brick1):1010:process] _GMaster:
        changelogs XSYNC-CHANGELOG.1444278758 could not be processed -
        moving on...

        [2015-10-08 15:33:43.616425] W
        [master(/data/gluster1/static/brick1):1014:process] _GMaster:
        SKIPPED GFID =
6c495557-6808-4ff9-98de-39afbbeeac82,16f94158-2f27-421b-9981-94d4197b2b3b,53d01d46-5724-4c77-846f-aacea7a3a447,9fbb536b-b7c6-41e1-8593-43e8a42b3fbe,1923ceff-d9a4-449e-b1c6-ce37c54d242c,3206332f-ed48-48d7-ad3f-cb82fbda0695,7696c570-edd5-481e-8cdc-3e...[truncated]

        
        That

        type of entry repeats until

        
        [2015-10-09 11:12:22.590574] I
          [master(/data/gluster1/static/brick1):1249:crawl] _GMaster:
          finished hybrid crawl syncing, stime: (1444349280, 617969)

          [2015-10-09 11:13:22.650285] I
          [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster:
          1 crawls, 1 turns

          [2015-10-09 11:13:22.653459] I
          [master(/data/gluster1/static/brick1):1242:crawl] _GMaster:
          starting hybrid crawl..., stime: (1444349280, 617969)

          [2015-10-09 11:13:22.670430] W
          [master(/data/gluster1/static/brick1):1366:Xcrawl] _GMaster:
          irregular xtime for
          ./racesoap/nominations/processed/.processed.2015-10-13.T.Ballina.V1.nomination.1444346457.247.Thj1Ly:

          ENOENT

        
        and then there were no more logs until 2015-10-13.

        
        Thanks,

        Wade.

        
      On 16/10/2015 4:33 pm, Aravinda
        wrote:

      
      Oh
        ok. I overlooked the status output. Please share the
        geo-replication logs from "james" and "hilton" nodes. 

        
        regards 

        Aravinda 

        
        On 10/15/2015 05:55 PM, Wade Fitzpatrick wrote: 

        Well I'm kind of worried about the 3
          million failures listed in the FAILURES column, the timestamp
          showing that syncing "stalled" 2 days ago and the fact that
          only half of the files have been transferred to the remote
          volume. 

          
          On 15/10/2015 9:27 pm, Aravinda wrote: 

          Status looks good. Two master bricks
            are Active and participating in syncing. Please let us know
            the issue you are observing. 

            regards 

            Aravinda 

            On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote: 

            I have twice now tried to configure
              geo-replication of our Stripe-Replicate volume to a remote
              Stripe volume but it always seems to have issues. 

              
              root@james:~# gluster volume info 

              
              Volume Name: gluster_shared_storage 

              Type: Replicate 

              Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc 

              Status: Started 

              Number of Bricks: 1 x 4 = 4 

              Transport-type: tcp 

              Bricks: 

              Brick1: james:/data/gluster1/geo-rep-meta/brick 

              Brick2: cupid:/data/gluster1/geo-rep-meta/brick 

              Brick3: hilton:/data/gluster1/geo-rep-meta/brick 

              Brick4: present:/data/gluster1/geo-rep-meta/brick 

              Options Reconfigured: 

              performance.readdir-ahead: on 

              
              Volume Name: static 

              Type: Striped-Replicate 

              Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273 

              Status: Started 

              Number of Bricks: 1 x 2 x 2 = 4 

              Transport-type: tcp 

              Bricks: 

              Brick1: james:/data/gluster1/static/brick1 

              Brick2: cupid:/data/gluster1/static/brick2 

              Brick3: hilton:/data/gluster1/static/brick3 

              Brick4: present:/data/gluster1/static/brick4 

              Options Reconfigured: 

              auth.allow: 10.x.* 

              features.scrub: Active 

              features.bitrot: on 

              performance.readdir-ahead: on 

              geo-replication.indexing: on 

              geo-replication.ignore-pid-check: on 

              changelog.changelog: on 

              
              root@palace:~# gluster volume info 

              
              Volume Name: static 

              Type: Stripe 

              Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3 

              Status: Started 

              Number of Bricks: 1 x 2 = 2 

              Transport-type: tcp 

              Bricks: 

              Brick1: palace:/data/gluster1/static/brick1 

              Brick2: madonna:/data/gluster1/static/brick2 

              Options Reconfigured: 

              features.scrub: Active 

              features.bitrot: on 

              performance.readdir-ahead: on 

              
              root@james:~# gluster vol geo-rep static ssh://gluster-b1::static
              status detail 

              
              MASTER NODE    MASTER VOL    MASTER BRICK SLAVE USER
              SLAVE                       SLAVE NODE STATUS     CRAWL
              STATUS       LAST_SYNCED            ENTRY DATA    META
              FAILURES    CHECKPOINT TIME    CHECKPOINT COMPLETED
              CHECKPOINT COMPLETION TIME 

              ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

              
              james          static        /data/gluster1/static/brick1
              root ssh://gluster-b1::static
              palace    Active     Changelog Crawl    2015-10-13
              14:23:20    0        0       0 1952064 N/A               
              N/A                     N/A 

              hilton         static        /data/gluster1/static/brick3
              root ssh://gluster-b1::static
              palace    Active     Changelog Crawl
              N/A                    0        0       0       1008035
              N/A                N/A                     N/A 

              present        static        /data/gluster1/static/brick4
              root ssh://gluster-b1::static
              madonna    Passive    N/A N/A                    N/A     
              N/A     N/A     N/A N/A               
              N/A                     N/A 

              cupid          static        /data/gluster1/static/brick2
              root ssh://gluster-b1::static
              madonna    Passive    N/A N/A                    N/A     
              N/A     N/A     N/A N/A               
              N/A                     N/A 

              
              So just to clarify, data is striped over bricks 1 and 3;
              bricks 2 and 4 are the replica. 

              
              Can someone help me diagnose the problem and find a
              solution? 

              
              Thanks in advance, 

              Wade. 

              
              _______________________________________________ 

              Gluster-users mailing list 

              Gluster-users@xxxxxxxxxxx
              

              http://www.gluster.org/mailman/listinfo/gluster-users
              

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users