Hi Wade,
There seems to be some issue in syncing the existing data in the
volume using Xsync crawl.
( To give some background: When geo-rep is started it goes to
filesystem crawl(Xsync) and sync all the data to slave, and then the
session switches to CHANGELOG mode).
We are looking in to this.
Any specific reason to go for Stripe volume? This seems to be not
extensively tested with geo-rep.
Thanks,
Saravana
On 10/19/2015 08:24 AM, Wade
Fitzpatrick wrote:
The relevant portions of the log appear to be as follows.
Everything seemed fairly normal (though quite slow) until
[2015-10-08 15:31:26.471216] I
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster:
finished hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:31:34.39248] I
[syncdutils(/data/gluster1/static/brick1):220:finalize]
<top>: exiting.
[2015-10-08 15:31:34.40934] I [repce(agent):92:service_loop]
RepceServer: terminating on reaching EOF.
[2015-10-08 15:31:34.41220] I [syncdutils(agent):220:finalize]
<top>: exiting.
[2015-10-08 15:31:35.615353] I [monitor(monitor):362:distribute]
<top>: slave bricks: [{'host': 'palace', 'dir':
'/data/gluster1/static/brick1'}, {'host': 'madonna', 'dir'
: '/data/gluster1/static/brick2'}]
[2015-10-08 15:31:35.616558] I [monitor(monitor):383:distribute]
<top>: worker specs: [('/data/gluster1/static/brick1', 'ssh://root@palace:gluster://localhost:static',
1)]
[2015-10-08 15:31:35.748434] I [monitor(monitor):221:monitor]
Monitor:
------------------------------------------------------------
[2015-10-08 15:31:35.748775] I [monitor(monitor):222:monitor]
Monitor: starting gsyncd worker
[2015-10-08 15:31:35.837651] I
[changelogagent(agent):75:__init__] ChangelogAgent: Agent
listining...
[2015-10-08 15:31:35.841150] I
[gsyncd(/data/gluster1/static/brick1):649:main_i] <top>:
syncing: gluster://localhost:static -> ssh://root@palace:gluster://localhost:static
[2015-10-08 15:31:38.543379] I
[master(/data/gluster1/static/brick1):83:gmaster_builder]
<top>: setting up xsync change detection mode
[2015-10-08 15:31:38.543802] I
[master(/data/gluster1/static/brick1):401:__init__] _GMaster:
using 'tar over ssh' as the sync engine
[2015-10-08 15:31:38.544673] I
[master(/data/gluster1/static/brick1):83:gmaster_builder]
<top>: setting up xsync change detection mode
[2015-10-08 15:31:38.544924] I
[master(/data/gluster1/static/brick1):401:__init__] _GMaster:
using 'tar over ssh' as the sync engine
[2015-10-08 15:31:38.546163] I
[master(/data/gluster1/static/brick1):83:gmaster_builder]
<top>: setting up xsync change detection mode
[2015-10-08 15:31:38.546406] I
[master(/data/gluster1/static/brick1):401:__init__] _GMaster:
using 'tar over ssh' as the sync engine
[2015-10-08 15:31:38.548989] I
[master(/data/gluster1/static/brick1):1220:register] _GMaster:
xsync temp directory:
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549267] I
[master(/data/gluster1/static/brick1):1220:register] _GMaster:
xsync temp directory:
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549467] I
[master(/data/gluster1/static/brick1):1220:register] _GMaster:
xsync temp directory:
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549632] I
[resource(/data/gluster1/static/brick1):1432:service_loop]
GLUSTER: Register time: 1444278698
[2015-10-08 15:31:38.582277] I
[master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster:
primary master with volume id
3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
[2015-10-08 15:31:38.584099] I
[master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster:
crawl interval: 60 seconds
[2015-10-08 15:31:38.587405] I
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster:
starting hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:31:38.588735] I
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster:
finished hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:31:38.590116] I
[master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster:
primary master with volume id
3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
[2015-10-08 15:31:38.591582] I
[master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster:
crawl interval: 60 seconds
[2015-10-08 15:31:38.593844] I
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster:
starting hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:31:38.594832] I
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster:
finished hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:32:38.641908] I
[master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1
crawls, 0 turns
[2015-10-08 15:32:38.644370] I
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster:
starting hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:32:39.646733] I
[master(/data/gluster1/static/brick1):1252:crawl] _GMaster:
processing xsync changelog
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync/XSYNC-CHANGELOG.1444278758
[2015-10-08 15:32:40.857084] W
[master(/data/gluster1/static/brick1):803:log_failures]
_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
'fc446c88-a5b7-468b-ac52-25b4225fe0cf', 'gid': 0, 'mode': 33188,
'entry':
'.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-1.html',
'op': 'MKNOD'}, 17, '02489235-13c5-4232-8d6d-c7843bc5249b')
[2015-10-08 15:32:40.858580] W
[master(/data/gluster1/static/brick1):803:log_failures]
_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
'e08813c5-055a-4354-94ec-f1b41a14b2a4', 'gid': 0, 'mode': 33188,
'entry':
'.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-2.html',
'op': 'MKNOD'}, 17, '0abae047-5816-4199-8203-fa8b974dfef5')
...
[2015-10-08 15:33:38.236779] W
[master(/data/gluster1/static/brick1):803:log_failures]
_GMaster: ENTRY FAILED: ({'uid': 1000, 'gfid':
'a41a2ac7-8fec-46bd-a4cc-8d8794e5ee39', 'gid
': 1000, 'mode': 33206, 'entry':
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/1PYhnxMyMMcQo8ukuyMsqq.png',
'op': 'MKNOD'}, 17, 'e047db7d-f96c-496f-8a83-5db8e41859ca')
[2015-10-08 15:33:38.237443] W
[master(/data/gluster1/static/brick1):803:log_failures]
_GMaster: ENTRY FAILED: ({'uid': 1000, 'gfid':
'507f77db-0dc0-4d7f-9eb3-8f56b3e01765', 'gid
': 1000, 'mode': 33206, 'entry':
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/17H7rpUIXGEQemM0wCoy6c.png',
'op': 'MKNOD'}, 17, 'ee7fa964-fc92-4008-b38a-e790fbbb1285')
[2015-10-08 15:33:38.238053] W
[master(/data/gluster1/static/brick1):803:log_failures]
_GMaster: ENTRY FAILED: ({'uid': 1000, 'gfid':
'6c495557-6808-4ff9-98de-39afbbeeac82', 'gid
': 1000, 'mode': 33206, 'entry':
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/3T3VvUQH44my0Eosiieeok.png',
'op': 'MKNOD'}, 17, 'cc6a75c4-0817-497e-912b-4442fd19db83')
[2015-10-08 15:33:43.615427] W
[master(/data/gluster1/static/brick1):1010:process] _GMaster:
changelogs XSYNC-CHANGELOG.1444278758 could not be processed -
moving on...
[2015-10-08 15:33:43.616425] W
[master(/data/gluster1/static/brick1):1014:process] _GMaster:
SKIPPED GFID =
6c495557-6808-4ff9-98de-39afbbeeac82,16f94158-2f27-421b-9981-94d4197b2b3b,53d01d46-5724-4c77-846f-aacea7a3a447,9fbb536b-b7c6-41e1-8593-43e8a42b3fbe,1923ceff-d9a4-449e-b1c6-ce37c54d242c,3206332f-ed48-48d7-ad3f-cb82fbda0695,7696c570-edd5-481e-8cdc-3e...[truncated]
That
type of entry repeats until
[2015-10-09 11:12:22.590574] I
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster:
finished hybrid crawl syncing, stime: (1444349280, 617969)
[2015-10-09 11:13:22.650285] I
[master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster:
1 crawls, 1 turns
[2015-10-09 11:13:22.653459] I
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster:
starting hybrid crawl..., stime: (1444349280, 617969)
[2015-10-09 11:13:22.670430] W
[master(/data/gluster1/static/brick1):1366:Xcrawl] _GMaster:
irregular xtime for
./racesoap/nominations/processed/.processed.2015-10-13.T.Ballina.V1.nomination.1444346457.247.Thj1Ly:
ENOENT
and then there were no more logs until 2015-10-13.
Thanks,
Wade.
On 16/10/2015 4:33 pm, Aravinda
wrote:
Oh
ok. I overlooked the status output. Please share the
geo-replication logs from "james" and "hilton" nodes.
regards
Aravinda
On 10/15/2015 05:55 PM, Wade Fitzpatrick wrote:
Well I'm kind of worried about the 3
million failures listed in the FAILURES column, the timestamp
showing that syncing "stalled" 2 days ago and the fact that
only half of the files have been transferred to the remote
volume.
On 15/10/2015 9:27 pm, Aravinda wrote:
Status looks good. Two master bricks
are Active and participating in syncing. Please let us know
the issue you are observing.
regards
Aravinda
On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote:
I have twice now tried to configure
geo-replication of our Stripe-Replicate volume to a remote
Stripe volume but it always seems to have issues.
root@james:~# gluster volume info
Volume Name: gluster_shared_storage
Type: Replicate
Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc
Status: Started
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: james:/data/gluster1/geo-rep-meta/brick
Brick2: cupid:/data/gluster1/geo-rep-meta/brick
Brick3: hilton:/data/gluster1/geo-rep-meta/brick
Brick4: present:/data/gluster1/geo-rep-meta/brick
Options Reconfigured:
performance.readdir-ahead: on
Volume Name: static
Type: Striped-Replicate
Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273
Status: Started
Number of Bricks: 1 x 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: james:/data/gluster1/static/brick1
Brick2: cupid:/data/gluster1/static/brick2
Brick3: hilton:/data/gluster1/static/brick3
Brick4: present:/data/gluster1/static/brick4
Options Reconfigured:
auth.allow: 10.x.*
features.scrub: Active
features.bitrot: on
performance.readdir-ahead: on
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
root@palace:~# gluster volume info
Volume Name: static
Type: Stripe
Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: palace:/data/gluster1/static/brick1
Brick2: madonna:/data/gluster1/static/brick2
Options Reconfigured:
features.scrub: Active
features.bitrot: on
performance.readdir-ahead: on
root@james:~# gluster vol geo-rep static ssh://gluster-b1::static
status detail
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER
SLAVE SLAVE NODE STATUS CRAWL
STATUS LAST_SYNCED ENTRY DATA META
FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED
CHECKPOINT COMPLETION TIME
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
james static /data/gluster1/static/brick1
root ssh://gluster-b1::static
palace Active Changelog Crawl 2015-10-13
14:23:20 0 0 0 1952064 N/A
N/A N/A
hilton static /data/gluster1/static/brick3
root ssh://gluster-b1::static
palace Active Changelog Crawl
N/A 0 0 0 1008035
N/A N/A N/A
present static /data/gluster1/static/brick4
root ssh://gluster-b1::static
madonna Passive N/A N/A N/A
N/A N/A N/A N/A
N/A N/A
cupid static /data/gluster1/static/brick2
root ssh://gluster-b1::static
madonna Passive N/A N/A N/A
N/A N/A N/A N/A
N/A N/A
So just to clarify, data is striped over bricks 1 and 3;
bricks 2 and 4 are the replica.
Can someone help me diagnose the problem and find a
solution?
Thanks in advance,
Wade.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
|