I found another log that I wasn't aware of in
/var/log/glusterfs/brick, that is te mount log, I confused the log
files. In this file I see a lot of entries like this one:
[2018-08-15 16:41:19.568477] I
[addr.c:55:compare_addr_and_update] 0-/mnt/brick1/gv1: allowed =
"172.20.36.10", received addr = "172.20.36.11"
[2018-08-15 16:41:19.568527] I
[addr.c:55:compare_addr_and_update] 0-/mnt/brick1/gv1: allowed =
"172.20.36.11", received addr = "172.20.36.11"
[2018-08-15 16:41:19.568547] I [login.c:76:gf_auth]
0-auth/login: allowed user names:
7107ccfa-0ba1-4172-aa5a-031568927bf1
[2018-08-15 16:41:19.568564] I [MSGID: 115029]
[server-handshake.c:793:server_setvolume] 0-gv1-server: accepted
client from
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
(version: 3.1
2.6)
[2018-08-15 16:41:19.582710] I [MSGID: 115036]
[server.c:527:server_rpc_notify] 0-gv1-server: disconnecting
connection from
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
[2018-08-15 16:41:19.582830] I [MSGID: 101055]
[client_t.c:443:gf_client_unref] 0-gv1-server: Shutting down
connection
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
So I see a lot of disconnections, right? This might be why the
self healing is triggered all the time?
Thanks!
Pablo.
Avature
Get Engaged to Talent
On 08/14/2018 09:15 AM, Pablo Schandin
wrote:
Thanks for the info!
I cannot see any logs in the mount log besides one line every
time it rotates
[2018-08-13 06:25:02.246187] I
[glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No
change in volfile,continuing
But I did find in the glfsheal-gv1.log of the volumes some kind
of server-client connection that was disconnected and now it
connects using a different port. The block of log per each run
is kind of long so I'm copying it into a pastebin.
https://pastebin.com/bp06rrsT
Maybe this has something to do with it?
Thanks!
Pablo.
On 08/11/2018 12:19 AM, Ravishankar N
wrote:
On 08/10/2018 11:25 PM, Pablo
Schandin wrote:
Hello everyone!
I'm having some trouble with something but I'm not quite
sure of with what yet. I'm running GlusterFS 3.12.6 on
Ubuntu 16.04. I have two servers (nodes) in the cluster in a
replica mode. Each server has 2 bricks. As the servers are
KVM running several VMs, one brick has some VMs locally
defined in it and the second brick is the replicated from
the other server. It has data but not actual writing is
being done except for the replication.
Server 1
Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write)
----> Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files
<----- Brick 2 defined VMs (read/write)
So, the main issue arose when I got a nagios alarm that
warned about a file listed to be healed. And then it
disappeared. I came to find out that every 5 minutes, the
self heal daemon triggers the healing and this fixes it. But
looking at the logs I have a lot of entries in the
glustershd.log file like this:
[2018-08-09 14:23:37.689403] I [MSGID:
108026] [afr-self-heal-common.c:1656:afr_log_selfheal]
0-gv1-replicate-0: Completed data selfheal on
407bd97b-e76c-4f81-8f59-7dae11507b0c. sources=[0] sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026]
[afr-self-heal-common.c:1656:afr_log_selfheal]
0-gv2-replicate-0: Completed data selfheal on
73713556-5b63-4f91-b83d-d7d82fee111f. sources=[0] sinks=1
The qcow2 files are being healed several times a day (up to
30 in occasions). As I understand, this means that a data
heal occurred on file with gfid 407b... and 7371... in
source to sink. Local server to replica server? Is it OK for
the shd to heal files in the replicated brick that
supposedly has no writing on it besides the mirroring? How
does that work?
In AFR, for writes, there is no notion of local/remote brick. No
matter from which client you write to the volume, it gets sent
to both bricks. i.e. the replication is synchronous and real
time.
How does afr replication work? The file with gfid 7371...
is the qcow2 root disk of an owncloud server with 17GB of
data. It does not seem to be that big to be a bottleneck of
some sort, I think.
Also, I was investigating the directory tree in
brick/.glusterfs/indices and I notices that both in xattrop
and dirty I always have a file created named xattrop-xxxxxx
and dirty-xxxxxx. I read that the xattrop file is like a
parent file or handle to reference other files created there
as hardlinks with gfid name for the shd to heal. Is the same
case as the ones in the dirty dir?
Yes, before the write, the gfid gets captured inside dirty on
all bricks. If the write is successful, it gets removed. In
addition, if the write fails on one brick, the other brick will
capture the gfid inside xattrop.
Any help will be greatly appreciated it. Thanks!
If frequent heals are triggered, it could mean there are
frequent network disconnects from the clients to the bricks as
writes happen. You can check the mount logs and see if that is
the case and investigate possible network issues.
HTH,
Ravi
Pablo.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
|
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users