On 10/21/2015 05:55 PM, Adrian
Gruntkowski wrote:
Hello,
I'm trying to track down a problem with my setup (version 3.7.3
on Debian stable).
I have a couple of volumes setup in 3-node configuration with 1
brick as an arbiter for each.
There are 4 volumes set up in cross-over across 3 physical
servers, like this:
------------------------------------->[
GigabitEthernet switch ]<--------------------------
| ^
|
| |
|
V V
V
/-------------------------- \
/-------------------------- \
/-------------------------- \
| web-rep | | cluster-rep
| | mail-rep |
| | |
| | |
| vols: | | vols:
| | vols: |
| system_www1 | | system_www1
| | system_www1(arbiter) |
| data_www1 | | data_www1
| | data_www1(arbiter) |
| system_mail1(arbiter) | | system_mail1
| | system_mail1 |
| data_mail1(arbiter) | | data_mail1
| | data_mail1 |
\---------------------------/
\---------------------------/
\---------------------------/
Now, after a fresh boot-up, everything seems to be running fine.
Then I start copying big files (KVM disk images) from local disk
to gluster mounts.
In the beginning it seems to be running fine (although iowait
seems go so high that it clogs up io operations
at some moments, but that's an issue for later). After some time
the transfer freezes, then
after some (long) time, it advances in a short burst to freeze
again. Another interesting thing is that
I see constant flow of the network traffic on interfaces
dedicated to gluster, even when there's a "freeze".
I have done "gluster volume statedump" at that time of transfer
(file is copied from local disk on cluster-rep
onto local mount of "system_www1" volume). I've observer a
following section in the dump for cluster-rep node:
[xlator.features.locks.system_www1-locks.inode]
path=/images/101/vm-101-disk-1.qcow2
mandatory=0
inodelk-count=12
lock-dump.domain.domain=system_www1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=c811600cd67f0000,
client=0x7fbe100df280,
connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,
granted at 2015-10-21 11:36:22
lock-dump.domain.domain=system_www1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=2195849216, len=131072, pid = 18446744073709551610,
owner=c811600cd67f0000, client=0x7fbe100df280,
connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,
granted at 2015-10-21 11:37:45
inodelk.inodelk[1](ACTIVE)=type=WRITE, whence=0,
start=9223372036854775805, len=1, pid = 18446744073709551610,
owner=c811600cd67f0000, client=0x7fbe100df280,
connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,
granted at 2015-10-21 11:36:22
From the statedump, It looks like self-heal daemon had taken locks
to heal the file due to which the locks attempted by the client
(mount) are in blocked state.
In Arbiter volumes the client (mount) takes full locks (start=0,
len=0) for every write() as opposed to normal replica volumes which
take range locks (i.e. appropriate start,len values) for that
write(). This is done to avoid network split-brains.
So in normal replica volumes, clients can still write to a file
while heal is going on, as long as the offsets don't overlap. This
is not the case with arbiter volumes.
You can look at the client or glustershd logs to see if there are
messages that indicate healing of a file, something along the lines
of "Completed data selfheal on xxx"
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0,
start=0, len=0, pid = 0, owner=c4fd2d78487f0000,
client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45
inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0,
len=0, pid = 0, owner=dc752e78487f0000, client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45
inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0,
len=0, pid = 0, owner=34832e78487f0000, client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45
inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0,
len=0, pid = 0, owner=d44d2e78487f0000, client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45
inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0,
len=0, pid = 0, owner=306f2e78487f0000, client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45
inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0,
len=0, pid = 0, owner=8c902e78487f0000, client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45
inodelk.inodelk[8](BLOCKED)=type=WRITE, whence=0, start=0,
len=0, pid = 0, owner=782c2e78487f0000, client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45
inodelk.inodelk[9](BLOCKED)=type=WRITE, whence=0, start=0,
len=0, pid = 0, owner=1c0b2e78487f0000, client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45
inodelk.inodelk[10](BLOCKED)=type=WRITE, whence=0, start=0,
len=0, pid = 0, owner=24332e78487f0000, client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45
There seem to be multiple locks in BLOCKED state - which doesn't
look normal to me. The other 2 nodes have
only 2 ACTIVE locks at the same time.
Below is "gluster volume info" output.
# gluster volume info
Volume Name: data_mail1
Type: Replicate
Volume ID: fc3259a1-ddcf-46e9-ae77-299aaad93b7c
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: cluster-rep:/GFS/data/mail1
Brick2: mail-rep:/GFS/data/mail1
Brick3: web-rep:/GFS/data/mail1
Options Reconfigured:
performance.readdir-ahead: on
cluster.quorum-count: 2
cluster.quorum-type: fixed
cluster.server-quorum-ratio: 51%
Volume Name: data_www1
Type: Replicate
Volume ID: 0c37a337-dbe5-4e75-8010-94e068c02026
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: cluster-rep:/GFS/data/www1
Brick2: web-rep:/GFS/data/www1
Brick3: mail-rep:/GFS/data/www1
Options Reconfigured:
performance.readdir-ahead: on
cluster.quorum-type: fixed
cluster.quorum-count: 2
cluster.server-quorum-ratio: 51%
Volume Name: system_mail1
Type: Replicate
Volume ID: 0568d985-9fa7-40a7-bead-298310622cb5
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: cluster-rep:/GFS/system/mail1
Brick2: mail-rep:/GFS/system/mail1
Brick3: web-rep:/GFS/system/mail1
Options Reconfigured:
performance.readdir-ahead: on
cluster.quorum-type: none
cluster.quorum-count: 2
cluster.server-quorum-ratio: 51%
Volume Name: system_www1
Type: Replicate
Volume ID: 147636a2-5c15-4d9a-93c8-44d51252b124
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: cluster-rep:/GFS/system/www1
Brick2: web-rep:/GFS/system/www1
Brick3: mail-rep:/GFS/system/www1
Options Reconfigured:
performance.readdir-ahead: on
cluster.quorum-type: none
cluster.quorum-count: 2
cluster.server-quorum-ratio: 51%
The issue does not occur when I get rid of 3rd arbiter brick.
What do you mean by 'getting rid of'? Killing the 3rd brick process
of the volume?
Regards,
Ravi
If there's any additional information that is missing and I
could provide, please let me know.
Greetings,
Adrian
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
|