Re: Geo-rep failing initial sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Wade,

There seems to be some issue in syncing the existing data in the volume using Xsync crawl.
( To give some background: When geo-rep is started it goes to filesystem crawl(Xsync) and sync all the data to slave, and then the session switches to CHANGELOG mode).

We are looking in to this.

Any specific reason to go for Stripe volume?  This seems to be not extensively tested with geo-rep.

Thanks,
Saravana

On 10/19/2015 08:24 AM, Wade Fitzpatrick wrote:
The relevant portions of the log appear to be as follows. Everything seemed fairly normal (though quite slow) until

[2015-10-08 15:31:26.471216] I [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:31:34.39248] I [syncdutils(/data/gluster1/static/brick1):220:finalize] <top>: exiting.
[2015-10-08 15:31:34.40934] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-10-08 15:31:34.41220] I [syncdutils(agent):220:finalize] <top>: exiting.
[2015-10-08 15:31:35.615353] I [monitor(monitor):362:distribute] <top>: slave bricks: [{'host': 'palace', 'dir': '/data/gluster1/static/brick1'}, {'host': 'madonna', 'dir'
: '/data/gluster1/static/brick2'}]
[2015-10-08 15:31:35.616558] I [monitor(monitor):383:distribute] <top>: worker specs: [('/data/gluster1/static/brick1', 'ssh://root@palace:gluster://localhost:static', 1)]
[2015-10-08 15:31:35.748434] I [monitor(monitor):221:monitor] Monitor: ------------------------------------------------------------
[2015-10-08 15:31:35.748775] I [monitor(monitor):222:monitor] Monitor: starting gsyncd worker
[2015-10-08 15:31:35.837651] I [changelogagent(agent):75:__init__] ChangelogAgent: Agent listining...
[2015-10-08 15:31:35.841150] I [gsyncd(/data/gluster1/static/brick1):649:main_i] <top>: syncing: gluster://localhost:static -> ssh://root@palace:gluster://localhost:static
[2015-10-08 15:31:38.543379] I [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting up xsync change detection mode
[2015-10-08 15:31:38.543802] I [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar over ssh' as the sync engine
[2015-10-08 15:31:38.544673] I [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting up xsync change detection mode
[2015-10-08 15:31:38.544924] I [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar over ssh' as the sync engine
[2015-10-08 15:31:38.546163] I [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting up xsync change detection mode
[2015-10-08 15:31:38.546406] I [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar over ssh' as the sync engine
[2015-10-08 15:31:38.548989] I [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549267] I [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549467] I [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549632] I [resource(/data/gluster1/static/brick1):1432:service_loop] GLUSTER: Register time: 1444278698
[2015-10-08 15:31:38.582277] I [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
[2015-10-08 15:31:38.584099] I [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl interval: 60 seconds
[2015-10-08 15:31:38.587405] I [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:31:38.588735] I [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:31:38.590116] I [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
[2015-10-08 15:31:38.591582] I [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl interval: 60 seconds
[2015-10-08 15:31:38.593844] I [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:31:38.594832] I [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:32:38.641908] I [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 crawls, 0 turns
[2015-10-08 15:32:38.644370] I [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:32:39.646733] I [master(/data/gluster1/static/brick1):1252:crawl] _GMaster: processing xsync changelog /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync/XSYNC-CHANGELOG.1444278758
[2015-10-08 15:32:40.857084] W [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid': 'fc446c88-a5b7-468b-ac52-25b4225fe0cf', 'gid': 0, 'mode': 33188, 'entry': '.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-1.html', 'op': 'MKNOD'}, 17, '02489235-13c5-4232-8d6d-c7843bc5249b')
[2015-10-08 15:32:40.858580] W [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid': 'e08813c5-055a-4354-94ec-f1b41a14b2a4', 'gid': 0, 'mode': 33188, 'entry': '.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-2.html', 'op': 'MKNOD'}, 17, '0abae047-5816-4199-8203-fa8b974dfef5')


...

[2015-10-08 15:33:38.236779] W [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY FAILED: ({'uid': 1000, 'gfid': 'a41a2ac7-8fec-46bd-a4cc-8d8794e5ee39', 'gid
': 1000, 'mode': 33206, 'entry': '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/1PYhnxMyMMcQo8ukuyMsqq.png', 'op': 'MKNOD'}, 17, 'e047db7d-f96c-496f-8a83-5db8e41859ca')
[2015-10-08 15:33:38.237443] W [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY FAILED: ({'uid': 1000, 'gfid': '507f77db-0dc0-4d7f-9eb3-8f56b3e01765', 'gid
': 1000, 'mode': 33206, 'entry': '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/17H7rpUIXGEQemM0wCoy6c.png', 'op': 'MKNOD'}, 17, 'ee7fa964-fc92-4008-b38a-e790fbbb1285')
[2015-10-08 15:33:38.238053] W [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: ENTRY FAILED: ({'uid': 1000, 'gfid': '6c495557-6808-4ff9-98de-39afbbeeac82', 'gid
': 1000, 'mode': 33206, 'entry': '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/3T3VvUQH44my0Eosiieeok.png', 'op': 'MKNOD'}, 17, 'cc6a75c4-0817-497e-912b-4442fd19db83')
[2015-10-08 15:33:43.615427] W [master(/data/gluster1/static/brick1):1010:process] _GMaster: changelogs XSYNC-CHANGELOG.1444278758 could not be processed - moving on...
[2015-10-08 15:33:43.616425] W [master(/data/gluster1/static/brick1):1014:process] _GMaster: SKIPPED GFID = 6c495557-6808-4ff9-98de-39afbbeeac82,16f94158-2f27-421b-9981-94d4197b2b3b,53d01d46-5724-4c77-846f-aacea7a3a447,9fbb536b-b7c6-41e1-8593-43e8a42b3fbe,1923ceff-d9a4-449e-b1c6-ce37c54d242c,3206332f-ed48-48d7-ad3f-cb82fbda0695,7696c570-edd5-481e-8cdc-3e...[truncated]


That type of entry repeats until

[2015-10-09 11:12:22.590574] I [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished hybrid crawl syncing, stime: (1444349280, 617969)
[2015-10-09 11:13:22.650285] I [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 crawls, 1 turns
[2015-10-09 11:13:22.653459] I [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting hybrid crawl..., stime: (1444349280, 617969)
[2015-10-09 11:13:22.670430] W [master(/data/gluster1/static/brick1):1366:Xcrawl] _GMaster: irregular xtime for ./racesoap/nominations/processed/.processed.2015-10-13.T.Ballina.V1.nomination.1444346457.247.Thj1Ly: ENOENT


and then there were no more logs until 2015-10-13.

Thanks,
Wade.

On 16/10/2015 4:33 pm, Aravinda wrote:
Oh ok. I overlooked the status output. Please share the geo-replication logs from "james" and "hilton" nodes.

regards
Aravinda

On 10/15/2015 05:55 PM, Wade Fitzpatrick wrote:
Well I'm kind of worried about the 3 million failures listed in the FAILURES column, the timestamp showing that syncing "stalled" 2 days ago and the fact that only half of the files have been transferred to the remote volume.

On 15/10/2015 9:27 pm, Aravinda wrote:
Status looks good. Two master bricks are Active and participating in syncing. Please let us know the issue you are observing.
regards
Aravinda
On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote:
I have twice now tried to configure geo-replication of our Stripe-Replicate volume to a remote Stripe volume but it always seems to have issues.

root@james:~# gluster volume info

Volume Name: gluster_shared_storage
Type: Replicate
Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc
Status: Started
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: james:/data/gluster1/geo-rep-meta/brick
Brick2: cupid:/data/gluster1/geo-rep-meta/brick
Brick3: hilton:/data/gluster1/geo-rep-meta/brick
Brick4: present:/data/gluster1/geo-rep-meta/brick
Options Reconfigured:
performance.readdir-ahead: on

Volume Name: static
Type: Striped-Replicate
Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273
Status: Started
Number of Bricks: 1 x 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: james:/data/gluster1/static/brick1
Brick2: cupid:/data/gluster1/static/brick2
Brick3: hilton:/data/gluster1/static/brick3
Brick4: present:/data/gluster1/static/brick4
Options Reconfigured:
auth.allow: 10.x.*
features.scrub: Active
features.bitrot: on
performance.readdir-ahead: on
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on

root@palace:~# gluster volume info

Volume Name: static
Type: Stripe
Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: palace:/data/gluster1/static/brick1
Brick2: madonna:/data/gluster1/static/brick2
Options Reconfigured:
features.scrub: Active
features.bitrot: on
performance.readdir-ahead: on

root@james:~# gluster vol geo-rep static ssh://gluster-b1::static status detail

MASTER NODE    MASTER VOL    MASTER BRICK SLAVE USER SLAVE                       SLAVE NODE STATUS     CRAWL STATUS       LAST_SYNCED            ENTRY DATA    META FAILURES    CHECKPOINT TIME    CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
james          static        /data/gluster1/static/brick1 root ssh://gluster-b1::static palace    Active     Changelog Crawl    2015-10-13 14:23:20    0        0       0 1952064 N/A                N/A                     N/A
hilton         static        /data/gluster1/static/brick3 root ssh://gluster-b1::static palace    Active     Changelog Crawl N/A                    0        0       0       1008035 N/A                N/A                     N/A
present        static        /data/gluster1/static/brick4 root ssh://gluster-b1::static madonna    Passive    N/A N/A                    N/A      N/A     N/A     N/A N/A                N/A                     N/A
cupid          static        /data/gluster1/static/brick2 root ssh://gluster-b1::static madonna    Passive    N/A N/A                    N/A      N/A     N/A     N/A N/A                N/A                     N/A


So just to clarify, data is striped over bricks 1 and 3; bricks 2 and 4 are the replica.

Can someone help me diagnose the problem and find a solution?

Thanks in advance,
Wade.


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users







_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux