Hi all.
I've recently installed three Gluster 3.5 servers, two masters and one geo-replication slave, all of them with 2 bricks. After some configuration problems It seems that all is working ok but I've found some problems with geo-replication.
First I'd like to do one question because I couldn't find the answers neither in documentation nor in any mail list:
This is the volume configuration:
root@filepre03:/gluster/jbossbricks/pre01/disk01/b01/.glusterfs/changelogs# gluster v info
(master)
Volume Name: jbpre01vol
Type: Distributed-Replicate
Volume ID: 316231f7-20bf-44f6-9d9b-20d4e3b27c2c
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: filepre03:/gluster/jbossbricks/pre01/disk01/b01
Brick2: filepre04:/gluster/jbossbricks/pre01/disk01/b01
Brick3: filepre03:/gluster/jbossbricks/pre01/disk02/b02
Brick4: filepre04:/gluster/jbossbricks/pre01/disk02/b02
Options Reconfigured:
diagnostics.brick-log-level: WARNING
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
(geo-replica)
Volume Name: jbpre01slvol
Type: Distribute
Volume ID: 0a4d2f3e-c803-4cfe-971b-2f8107180a69
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: filepre05:/gluster/jbossbricks/pre01/disk01/b01
Brick2: filepre05:/gluster/jbossbricks/pre01/disk02/b02
Options Reconfigured:
diagnostics.brick-log-level: WARNING
And geo replication is running on bricks b01 and b02 :
root@filepre03:/gluster/jbossbricks/pre01/disk01/b01/.glusterfs/changelogs# gluster v g jbpre01vol filepre05::jbpre01slvol status
MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS
-------------------------------------------------------------------------------------------------------------------------------------------------
filepre03 jbpre01vol /gluster/jbossbricks/pre01/disk01/b01 filepre05::jbpre01slvol Active N/A Changelog Crawl
filepre03 jbpre01vol /gluster/jbossbricks/pre01/disk02/b02 filepre05::jbpre01slvol Active N/A Changelog Crawl
filepre04 jbpre01vol /gluster/jbossbricks/pre01/disk01/b01 filepre05::jbpre01slvol Passive N/A N/A
filepre04 jbpre01vol /gluster/jbossbricks/pre01/disk02/b02 filepre05::jbpre01slvol Passive N/A N/A
Tests are done from another server mounting master and slave volumes:
root@testgluster:/mnt/gluster# mount |grep gluster
filepre03:/jbpre01vol on /mnt/gluster/pre01filepre03 type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
filepre04:/jbpre01vol on /mnt/gluster/pre01filepre04 type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
filepre05:/jbpre01slvol on /mnt/gluster/pre01filepre05 type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
My question is about directories dates in geo replica, in all my tests directory date in remote server shows the date when replication was executed, not original date. Is this the usual behavior?
For example:
root@testgluster:/mnt/gluster# mkdir /mnt/gluster/pre01filepre03/TESTDIR1
root@testgluster:/mnt/gluster# echo TEST > pre01filepre03/TESTDIR1/TESTING1
After a while, gluster has created dir and file but directory's date is current date not original:
root@testgluster:/mnt/gluster# ls -d --full-time pre01filepre0*/TESTDIR1
drwxr-xr-x 2 root root 8192 2014-06-30 11:55:18.651528230 +0200 pre01filepre03/TESTDIR1
drwxr-xr-x 2 root root 8192 2014-06-30 11:55:18.652637248 +0200 pre01filepre04/TESTDIR1
drwxr-xr-x 2 root root 8192 2014-06-30 11:56:14.087626822 +0200 pre01filepre05/TESTDIR1 (geo-replica)
However file is replicated with original date:
root@testgluster:/mnt/gluster# find . -type f -exec ls --full-time {} \;
-rw-r--r-- 1 root root 5 2014-06-30 11:55:18.664637725 +0200 ./pre01filepre04/TESTDIR1/TESTING1
-rw-r--r-- 1 root root 5 2014-06-30 11:55:18.663528750 +0200 ./pre01filepre03/TESTDIR1/TESTING1
-rw-r--r-- 1 root root 5 2014-06-30 11:55:18.000000000 +0200 ./pre01filepre05/TESTDIR1/TESTING1 (geo-replica)
This makes dificult to validate any syncing error between masters and slave using commands like rsync because directories are always different:
root@testgluster:/mnt/gluster# rsync -avn pre01filepre03/TESTDIR1/ pre01filepre05
sending incremental file list
./
TESTING1
sent 49 bytes received 18 bytes 134.00 bytes/sec
total size is 5 speedup is 0.07 (DRY RUN)
Next, I would ask If someone has found next problem when bricks in remoter server goes down:
After check currently status I've kill one brick process in remote server:
root@filepre03:~# gluster v status jbpre01vol
Status of volume: jbpre01vol
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick filepre03:/gluster/jbossbricks/pre01/disk01/b01 49169 Y 8167
Brick filepre04:/gluster/jbossbricks/pre01/disk01/b01 49172 Y 7027
Brick filepre03:/gluster/jbossbricks/pre01/disk02/b02 49170 Y 8180
Brick filepre04:/gluster/jbossbricks/pre01/disk02/b02 49173 Y 7040
NFS Server on localhost 2049 Y 2088
Self-heal Daemon on localhost N/A Y 30873
NFS Server on filepre04 2049 Y 9171
Self-heal Daemon on filepre04 N/A Y 7061
NFS Server on filepre05 2049 Y 1128
Self-heal Daemon on filepre05 N/A Y 1137
root@filepre03:~# gluster v status jbpre01slvol
Status of volume: jbpre01slvol
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick filepre05:/gluster/jbossbricks/pre01/disk01/b01 49152 Y 6321
Brick filepre05:/gluster/jbossbricks/pre01/disk02/b02 49155 Y 6375
NFS Server on localhost 2049 Y 2088
NFS Server on filepre04 2049 Y 9171
NFS Server on filepre05 2049 Y 1128
root@filepre03:~# gluster v g status
MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS
-------------------------------------------------------------------------------------------------------------------------------------------------
filepre03 jbpre01vol /gluster/jbossbricks/pre01/disk01/b01 filepre05::jbpre01slvol Active N/A Changelog Crawl
filepre03 jbpre01vol /gluster/jbossbricks/pre01/disk02/b02 filepre05::jbpre01slvol Active N/A Changelog Crawl
filepre04 jbpre01vol /gluster/jbossbricks/pre01/disk01/b01 filepre05::jbpre01slvol Passive N/A N/A
filepre04 jbpre01vol /gluster/jbossbricks/pre01/disk02/b02 filepre05::jbpre01slvol Passive N/A N/A
root@filepre05:/gluster/jbossbricks/pre01# kill -9 6375 (slave: brick 02 process)
root@filepre03:~# gluster v status jbpre01slvol
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick filepre05:/gluster/jbossbricks/pre01/disk01/b01 49152 Y 6321
Brick filepre05:/gluster/jbossbricks/pre01/disk02/b02 N/A N 6375
NFS Server on localhost 2049 Y 2088
NFS Server on filepre04 2049 Y 9171
NFS Server on filepre05 2049 Y 1128
If I kill only one brick process geo replication doesn't show any problem and doesn't detect problems when a client writes on the brick.
I write on some files:
root@testgluster:/mnt/gluster# echo "TEST2" >pre01filepre03/TESTFILE2
root@testgluster:/mnt/gluster# echo "TEST3" >pre01filepre03/TESTFILE3
root@testgluster:/mnt/gluster# echo "TEST4" >pre01filepre03/TESTFILE4
root@testgluster:/mnt/gluster# echo "TEST5" >pre01filepre03/TESTFILE5
Then, I check where they are:
root@filepre03:~# find /gluster -name "TESTFILE*" -exec ls -l {} \;
-rw-r--r-- 2 root root 6 Jun 30 16:11 /gluster/jbossbricks/pre01/disk01/b01/TESTFILE3
-rw-r--r-- 2 root root 6 Jun 30 16:11 /gluster/jbossbricks/pre01/disk01/b01/TESTFILE4
-rw-r--r-- 2 root root 6 Jun 30 16:09 /gluster/jbossbricks/pre01/disk01/b01/TESTFILE2
-rw-r--r-- 2 root root 6 Jun 30 16:12 /gluster/jbossbricks/pre01/disk02/b02/TESTFILE5 <--- brick 02
and finally check geo replication status:
root@filepre03:~root@filepre03:~# gluster v g jbpre01vol filepre05::jbpre01slvol status detail
MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
filepre03 jbpre01vol /gluster/jbossbricks/pre01/disk01/b01 filepre05::jbpre01slvol Active N/A Changelog Crawl 3738 0 0 0 0
filepre03 jbpre01vol /gluster/jbossbricks/pre01/disk02/b02 filepre05::jbpre01slvol Active N/A Changelog Crawl 3891 0 0 0 0
filepre04 jbpre01vol /gluster/jbossbricks/pre01/disk01/b01 filepre05::jbpre01slvol Passive N/A N/A 0 0 0 0 0
filepre04 jbpre01vol /gluster/jbossbricks/pre01/disk02/b02 filepre05::jbpre01slvol Passive N/A N/A 0 0 0 0 0
No problem is shown, no file sync is pending but on remote server:
root@testgluster:/mnt/gluster# ll pre01filepre05
total 14
drwxr-xr-x 4 root root 4096 Jun 30 16:12 ./
drwxr-xr-x 5 root root 4096 Jun 26 12:57 ../
-rw-r--r-- 1 root root 6 Jun 30 16:09 TESTFILE2
-rw-r--r-- 1 root root 6 Jun 30 16:11 TESTFILE3
-rw-r--r-- 1 root root 6 Jun 30 16:11 TESTFILE4
... only files written on brick 01 have been replicated
I can't find any gluster command that returns a failed status.
Only one more comment, if I kill all brick process at remote server then geo replication status change to faulty as I expected
Any additional info will be appreciated.
Thank you
Eva
I've recently installed three Gluster 3.5 servers, two masters and one geo-replication slave, all of them with 2 bricks. After some configuration problems It seems that all is working ok but I've found some problems with geo-replication.
First I'd like to do one question because I couldn't find the answers neither in documentation nor in any mail list:
This is the volume configuration:
root@filepre03:/gluster/jbossbricks/pre01/disk01/b01/.glusterfs/changelogs# gluster v info
(master)
Volume Name: jbpre01vol
Type: Distributed-Replicate
Volume ID: 316231f7-20bf-44f6-9d9b-20d4e3b27c2c
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: filepre03:/gluster/jbossbricks/pre01/disk01/b01
Brick2: filepre04:/gluster/jbossbricks/pre01/disk01/b01
Brick3: filepre03:/gluster/jbossbricks/pre01/disk02/b02
Brick4: filepre04:/gluster/jbossbricks/pre01/disk02/b02
Options Reconfigured:
diagnostics.brick-log-level: WARNING
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
(geo-replica)
Volume Name: jbpre01slvol
Type: Distribute
Volume ID: 0a4d2f3e-c803-4cfe-971b-2f8107180a69
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: filepre05:/gluster/jbossbricks/pre01/disk01/b01
Brick2: filepre05:/gluster/jbossbricks/pre01/disk02/b02
Options Reconfigured:
diagnostics.brick-log-level: WARNING
And geo replication is running on bricks b01 and b02 :
root@filepre03:/gluster/jbossbricks/pre01/disk01/b01/.glusterfs/changelogs# gluster v g jbpre01vol filepre05::jbpre01slvol status
MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS
-------------------------------------------------------------------------------------------------------------------------------------------------
filepre03 jbpre01vol /gluster/jbossbricks/pre01/disk01/b01 filepre05::jbpre01slvol Active N/A Changelog Crawl
filepre03 jbpre01vol /gluster/jbossbricks/pre01/disk02/b02 filepre05::jbpre01slvol Active N/A Changelog Crawl
filepre04 jbpre01vol /gluster/jbossbricks/pre01/disk01/b01 filepre05::jbpre01slvol Passive N/A N/A
filepre04 jbpre01vol /gluster/jbossbricks/pre01/disk02/b02 filepre05::jbpre01slvol Passive N/A N/A
Tests are done from another server mounting master and slave volumes:
root@testgluster:/mnt/gluster# mount |grep gluster
filepre03:/jbpre01vol on /mnt/gluster/pre01filepre03 type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
filepre04:/jbpre01vol on /mnt/gluster/pre01filepre04 type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
filepre05:/jbpre01slvol on /mnt/gluster/pre01filepre05 type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
My question is about directories dates in geo replica, in all my tests directory date in remote server shows the date when replication was executed, not original date. Is this the usual behavior?
For example:
root@testgluster:/mnt/gluster# mkdir /mnt/gluster/pre01filepre03/TESTDIR1
root@testgluster:/mnt/gluster# echo TEST > pre01filepre03/TESTDIR1/TESTING1
After a while, gluster has created dir and file but directory's date is current date not original:
root@testgluster:/mnt/gluster# ls -d --full-time pre01filepre0*/TESTDIR1
drwxr-xr-x 2 root root 8192 2014-06-30 11:55:18.651528230 +0200 pre01filepre03/TESTDIR1
drwxr-xr-x 2 root root 8192 2014-06-30 11:55:18.652637248 +0200 pre01filepre04/TESTDIR1
drwxr-xr-x 2 root root 8192 2014-06-30 11:56:14.087626822 +0200 pre01filepre05/TESTDIR1 (geo-replica)
However file is replicated with original date:
root@testgluster:/mnt/gluster# find . -type f -exec ls --full-time {} \;
-rw-r--r-- 1 root root 5 2014-06-30 11:55:18.664637725 +0200 ./pre01filepre04/TESTDIR1/TESTING1
-rw-r--r-- 1 root root 5 2014-06-30 11:55:18.663528750 +0200 ./pre01filepre03/TESTDIR1/TESTING1
-rw-r--r-- 1 root root 5 2014-06-30 11:55:18.000000000 +0200 ./pre01filepre05/TESTDIR1/TESTING1 (geo-replica)
This makes dificult to validate any syncing error between masters and slave using commands like rsync because directories are always different:
root@testgluster:/mnt/gluster# rsync -avn pre01filepre03/TESTDIR1/ pre01filepre05
sending incremental file list
./
TESTING1
sent 49 bytes received 18 bytes 134.00 bytes/sec
total size is 5 speedup is 0.07 (DRY RUN)
Next, I would ask If someone has found next problem when bricks in remoter server goes down:
After check currently status I've kill one brick process in remote server:
root@filepre03:~# gluster v status jbpre01vol
Status of volume: jbpre01vol
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick filepre03:/gluster/jbossbricks/pre01/disk01/b01 49169 Y 8167
Brick filepre04:/gluster/jbossbricks/pre01/disk01/b01 49172 Y 7027
Brick filepre03:/gluster/jbossbricks/pre01/disk02/b02 49170 Y 8180
Brick filepre04:/gluster/jbossbricks/pre01/disk02/b02 49173 Y 7040
NFS Server on localhost 2049 Y 2088
Self-heal Daemon on localhost N/A Y 30873
NFS Server on filepre04 2049 Y 9171
Self-heal Daemon on filepre04 N/A Y 7061
NFS Server on filepre05 2049 Y 1128
Self-heal Daemon on filepre05 N/A Y 1137
root@filepre03:~# gluster v status jbpre01slvol
Status of volume: jbpre01slvol
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick filepre05:/gluster/jbossbricks/pre01/disk01/b01 49152 Y 6321
Brick filepre05:/gluster/jbossbricks/pre01/disk02/b02 49155 Y 6375
NFS Server on localhost 2049 Y 2088
NFS Server on filepre04 2049 Y 9171
NFS Server on filepre05 2049 Y 1128
root@filepre03:~# gluster v g status
MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS
-------------------------------------------------------------------------------------------------------------------------------------------------
filepre03 jbpre01vol /gluster/jbossbricks/pre01/disk01/b01 filepre05::jbpre01slvol Active N/A Changelog Crawl
filepre03 jbpre01vol /gluster/jbossbricks/pre01/disk02/b02 filepre05::jbpre01slvol Active N/A Changelog Crawl
filepre04 jbpre01vol /gluster/jbossbricks/pre01/disk01/b01 filepre05::jbpre01slvol Passive N/A N/A
filepre04 jbpre01vol /gluster/jbossbricks/pre01/disk02/b02 filepre05::jbpre01slvol Passive N/A N/A
root@filepre05:/gluster/jbossbricks/pre01# kill -9 6375 (slave: brick 02 process)
root@filepre03:~# gluster v status jbpre01slvol
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick filepre05:/gluster/jbossbricks/pre01/disk01/b01 49152 Y 6321
Brick filepre05:/gluster/jbossbricks/pre01/disk02/b02 N/A N 6375
NFS Server on localhost 2049 Y 2088
NFS Server on filepre04 2049 Y 9171
NFS Server on filepre05 2049 Y 1128
If I kill only one brick process geo replication doesn't show any problem and doesn't detect problems when a client writes on the brick.
I write on some files:
root@testgluster:/mnt/gluster# echo "TEST2" >pre01filepre03/TESTFILE2
root@testgluster:/mnt/gluster# echo "TEST3" >pre01filepre03/TESTFILE3
root@testgluster:/mnt/gluster# echo "TEST4" >pre01filepre03/TESTFILE4
root@testgluster:/mnt/gluster# echo "TEST5" >pre01filepre03/TESTFILE5
Then, I check where they are:
root@filepre03:~# find /gluster -name "TESTFILE*" -exec ls -l {} \;
-rw-r--r-- 2 root root 6 Jun 30 16:11 /gluster/jbossbricks/pre01/disk01/b01/TESTFILE3
-rw-r--r-- 2 root root 6 Jun 30 16:11 /gluster/jbossbricks/pre01/disk01/b01/TESTFILE4
-rw-r--r-- 2 root root 6 Jun 30 16:09 /gluster/jbossbricks/pre01/disk01/b01/TESTFILE2
-rw-r--r-- 2 root root 6 Jun 30 16:12 /gluster/jbossbricks/pre01/disk02/b02/TESTFILE5 <--- brick 02
and finally check geo replication status:
root@filepre03:~root@filepre03:~# gluster v g jbpre01vol filepre05::jbpre01slvol status detail
MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
filepre03 jbpre01vol /gluster/jbossbricks/pre01/disk01/b01 filepre05::jbpre01slvol Active N/A Changelog Crawl 3738 0 0 0 0
filepre03 jbpre01vol /gluster/jbossbricks/pre01/disk02/b02 filepre05::jbpre01slvol Active N/A Changelog Crawl 3891 0 0 0 0
filepre04 jbpre01vol /gluster/jbossbricks/pre01/disk01/b01 filepre05::jbpre01slvol Passive N/A N/A 0 0 0 0 0
filepre04 jbpre01vol /gluster/jbossbricks/pre01/disk02/b02 filepre05::jbpre01slvol Passive N/A N/A 0 0 0 0 0
No problem is shown, no file sync is pending but on remote server:
root@testgluster:/mnt/gluster# ll pre01filepre05
total 14
drwxr-xr-x 4 root root 4096 Jun 30 16:12 ./
drwxr-xr-x 5 root root 4096 Jun 26 12:57 ../
-rw-r--r-- 1 root root 6 Jun 30 16:09 TESTFILE2
-rw-r--r-- 1 root root 6 Jun 30 16:11 TESTFILE3
-rw-r--r-- 1 root root 6 Jun 30 16:11 TESTFILE4
... only files written on brick 01 have been replicated
I can't find any gluster command that returns a failed status.
Only one more comment, if I kill all brick process at remote server then geo replication status change to faulty as I expected
Any additional info will be appreciated.
Thank you
Eva
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel