Re: [Centos7x64] Geo-replication problem glusterfs 3.7.0-2

wodel youchi <wodel.youchi@xxxxxxxxx> · Mon, 25 May 2015 13:25:24 +0100

Hi, and thanks for your replies.

For Kotresh : No, I am not using tar ssh for my geo-replication.

For Aravinda: I had to recreate my slave volume all over et restart the geo-replication.

If I have thousands of files with this problem, do I have to execute the fix for all of them? is there an easy way?
Can checkpoints help me in this situation?
and more important, what can cause this problem?

I am syncing containers, they contain lot of files small files, using tar ssh, would it be more suitable?

PS: I tried to execute this command on the Master
bash generate-gfid-file.sh localhost:data2   $PWD/get-gfid.sh    /tmp/master_gfid_file.txt

but I got errors with files that have blank (space) in their names, for example: Admin Guide.pdf
the script sees two files Admin and Guide.pdf, then the get-gfid.sh returns errors "no such file or directory"

thanks.

2015-05-25 7:00 GMT+01:00 Aravinda <avishwan@xxxxxxxxxx>:
Looks like this is GFID conflict issue not the tarssh issue.

_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':

'e529a399-756d-4cb1-9779-0af2822a0d94', 'gid': 0, 'mode': 33152, 'entry':

'.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.mdb', 'op': 'CREATE'}, 2)

    Data: {'uid': 0,

           'gfid': 'e529a399-756d-4cb1-9779-0af2822a0d94',

           'gid': 0,

           'mode': 33152,

           'entry': '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.mdb',

           'op': 'CREATE'}

    and Error: 2

During creation of "main.mdb" RPC failed with error number 2, ie, ENOENT. This error comes when parent directory not exists or exists with different GFID.

In this case Parent GFID "874799ef-df75-437b-bc8f-3fcd58b54789" does not exists on slave.

To fix the issue,

-----------------

Find the parent directory of "main.mdb",

Get the GFID of that directory, using getfattr

Check the GFID of the same directory in Slave(To confirm GFIDs are different)

To fix the issue, Delete that directory in Slave.

Set virtual xattr for that directory and all the files inside that directory.

    setfattr -n glusterfs.geo-rep.trigger-sync -v "1" <DIR>

    setfattr -n glusterfs.geo-rep.trigger-sync -v "1" <file-path>

Geo-rep will recreate the directory with Proper GFID and starts sync.

Let us know if you need any help.

--

regards

Aravinda

On 05/25/2015 10:54 AM, Kotresh Hiremath Ravishankar wrote:

Hi Wodel,

Is the sync mode, tar over ssh (i.e., config use_tarssh is true) ?

If yes, there is known issue with it and patch is already up in master.

But it can be resolved in either of the two ways.

1. If sync mode required is tar over ssh, just disable sync_xattrs which is true

    by default.

     gluster vol geo-rep <master-vol> <slave-host>::<slave-vol> config sync_xattrs false

2. If sync mode is ok to be changed to rsync. Please do.

          gluster vol geo-rep <master-vol> <slave-host>::<slave-vol> use_tarssh false

NOTE: rsync supports syncing of acls and xattrs where as tar over ssh does not.

       In 3.7.0-2, tar over ssh should be used with sync_xattrs to false

Hope this helps.

Thanks and Regards,

Kotresh H R

----- Original Message -----

From: "wodel youchi" <wodel.youchi@xxxxxxxxx>

To: "gluster-users" <gluster-users@xxxxxxxxxxx>

Sent: Sunday, May 24, 2015 3:31:38 AM

Subject:  [Centos7x64] Geo-replication problem glusterfs 3.7.0-2

Hi,

I have two gluster servers in replicated mode as MASTERS

and one server for replicated geo-replication.

I've updated my glusterfs installation to 3.7.0-2, all three servers

I've recreated my slave volumes

I've started the geo-replication, it worked for a while and now I have some

problmes

1- Files/directories are not deleted on slave

2- New files/rectories are not synced to the slave.

I have these lines on the active master

[2015-05-23 06:21:17.156939] W [master(/mnt/brick2/brick):792:log_failures]

_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':

'e529a399-756d-4cb1-9779-0af2822a0d94', 'gid': 0, 'mode': 33152, 'entry':

'.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.mdb', 'op': 'CREATE'}, 2)

[2015-05-23 06:21:17.158066] W [master(/mnt/brick2/brick):792:log_failures]

_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':

'b4bffa4c-2e88-4b60-9f6a-c665c4d9f7ed', 'gid': 0, 'mode': 33152, 'entry':

'.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.hdb', 'op': 'CREATE'}, 2)

[2015-05-23 06:21:17.159154] W [master(/mnt/brick2/brick):792:log_failures]

_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':

'9920cdee-6b87-4408-834b-4389f5d451fe', 'gid': 0, 'mode': 33152, 'entry':

'.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.db', 'op': 'CREATE'}, 2)

[2015-05-23 06:21:17.160242] W [master(/mnt/brick2/brick):792:log_failures]

_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':

'307756d2-d924-456f-b090-10d3ff9caccb', 'gid': 0, 'mode': 33152, 'entry':

'.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.ndb', 'op': 'CREATE'}, 2)

[2015-05-23 06:21:17.161283] W [master(/mnt/brick2/brick):792:log_failures]

_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':

'69ebb4cb-1157-434b-a6e9-386bea81fc1d', 'gid': 0, 'mode': 33152, 'entry':

'.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/COPYING', 'op': 'CREATE'}, 2)

[2015-05-23 06:21:17.162368] W [master(/mnt/brick2/brick):792:log_failures]

_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':

'7d132fda-fc82-4ad8-8b6c-66009999650c', 'gid': 0, 'mode': 33152, 'entry':

'.gfid/f6f2582e-0c5c-4cba-943a-6d5f64baf340/daily.cld', 'op': 'CREATE'}, 2)

[2015-05-23 06:21:17.163718] W [master(/mnt/brick2/brick):792:log_failures]

_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':

'd8a0303e-ba45-4e45-a8fd-17994c34687b', 'gid': 0, 'mode': 16832, 'entry':

'.gfid/f6f2582e-0c5c-4cba-943a-6d5f64baf340/clamav-54acc14b44e696e1cfb4a75ecc395fe0',

'op': 'MKDIR'}, 2)

[2015-05-23 06:21:17.165102] W [master(/mnt/brick2/brick):792:log_failures]

_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':

'49d42bf6-3146-42bd-bc29-e704927d6133', 'gid': 0, 'mode': 16832, 'entry':

'.gfid/f6f2582e-0c5c-4cba-943a-6d5f64baf340/clamav-debec3aa6afe64bffaee8d099e76f3d4',

'op': 'MKDIR'}, 2)

[2015-05-23 06:21:17.166147] W [master(/mnt/brick2/brick):792:log_failures]

_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':

'1ddb93ae-3717-4347-910f-607afa67cdb0', 'gid': 0, 'mode': 33152, 'entry':

'.gfid/49d42bf6-3146-42bd-bc29-e704927d6133/clamav-704a1e9a3e2c97ccac127632d7c6b8e4',

'op': 'CREATE'}, 2)

in the slave lot of lines like this

[2015-05-22 07:53:57.071999] W [fuse-bridge.c:1970:fuse_create_cbk]

0-glusterfs-fuse: 25833: /.gfid/03a5a40b-c521-47ac-a4e3-916a6df42689 => -1

(Operation not permitted)

in the active master I have 3.7 GB of XSYNC-CHANGELOG.xxxxxxx files in

/var/lib/misc/glusterfsd/data2/ssh%3A%2F%2Froot%4010.10.10.10%3Agluster%3A%2F%2F127.0.0.1%3Aslavedata2/e55761a256af4acfe9b4a419be62462a/xsync

I don't know if this is normal.

any idea?

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users