Also, I've checked shd log files, and found out that for some reason shd
constantly reconnects to bricks: [1]
Please note that suggested fix [2] by Pranith does not help, VIRT value
still grows:
===
root 1010 0.0 9.6 7415248 374688 ? Ssl чер07 0:14
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
===
I do not know the reason why it is reconnecting, but I suspect leak to
happen on that reconnect.
CCing Pranith.
[1] http://termbin.com/brob
[2] http://review.gluster.org/#/c/14053/
06.06.2016 12:21, Kaushal M написав:
Has multi-threaded SHD been merged into 3.7.* by any chance? If not,
what I'm saying below doesn't apply.
We saw problems when encrypted transports were used, because the RPC
layer was not reaping threads (doing pthread_join) when a connection
ended. This lead to similar observations of huge VIRT and relatively
small RSS.
I'm not sure how multi-threaded shd works, but it could be leaking
threads in a similar way.
On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko
<oleksandr@xxxxxxxxxxxxxx> wrote:
Hello.
We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for
keeping
volumes metadata.
Now we observe huge VSZ (VIRT) usage by glustershd on dummy node:
===
root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket
--xlator-option
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
===
that is ~73G. RSS seems to be OK (~522M). Here is the statedump of
glustershd process: [1]
Also, here is sum of sizes, presented in statedump:
===
# cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '='
'BEGIN
{sum=0} /^size=/ {sum+=$2} END {print sum}'
353276406
===
That is ~337 MiB.
Also, here are VIRT values from 2 replica nodes:
===
root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/44ec3f29003eccedf894865107d5db90.socket
--xlator-option
*replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87
root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket
--xlator-option
*replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2
===
Those are 5 to 6G, which is much less than dummy node has, but still
look
too big for us.
Should we care about huge VIRT value on dummy node? Also, how one
would
debug that?
Regards,
Oleksandr.
[1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel