Here are three profile reports from
60-second intervals:
Ubuntu 18.04 system with low load:
Ubuntu 14.04 system with low load:
Ubuntu 14.04 system with high load:
Each of these systems is "gluster1"
in the report. In each cluster, there are two bricks,
gluster1:/md3/gluster and gluster2:/md3/gluster. The systems are
identical hardware-wise (I noticed this morning that the 18.04
upgrade applied a powersave governor to the CPU. I changed it to
the performance governor before running the profile, but that
doesn't seem to have changed the iowait behavior or the profile
report appreciably).
What jumps out at me for the 18.04
systems is:
1) The excessively high average
latency of the FINODELK operations on the *local* brick (i.e.
gluster1:/md3/gluster). The latency is far lower for these
FINODELK operations against the other node's brick
(gluster2:/md3/gluster). This is puzzling to me.
2) Almost double higher average
latency for FSYNC operations against both the gluster1 and
gluster2 bricks.
On the 14.04 systems, the number of
FINODELK operations performed during the 60-second interval is
much lower (even on the highload system). And the latencies are
lower.
Regards,
-Kartik
On 2/21/19 12:18 AM, Amar Tumballi
Suryanarayan wrote:
If you have both systems to get some idea, can
you get the `gluster profile info' output? That helps a bit to
understand the issue.
We're running gluster on
two hypervisors running Ubuntu. When we
upgraded from Ubuntu 14.04 to 18.04, it upgraded gluster
from 3.4.2 to
3.13.2. As soon as we upgraded and since then, we've been
seeing
substantially higher iowait on the system, as measured by
top and iotop,
and iotop indicates that glusterfsd is the culprit. For some
reason,
glusterfsd is doing more disk reads and/or those reads are
being held up
up at a greater rate. The guest VMs are also seeing more
iowait -- their
images are hosted on the gluster volume. This is causing
inconsistent
responsiveness from the services hosted on the VMs.
I'm looking for any recommendations on how to troubleshoot
and/or
resolve this problem. We have other sites that are still
running 14.04,
so I can compare/contrast any configuration parameters and
performance.
The block scheduler on 14.04 was set to deadline and 18.04
was set to
cfq. But changing the 18.04 scheduler to deadline didn't
make any
difference.
I was wondering whether glusterfsd on 18.04 isn't caching as
much as it
should. We tried increasing performance.cache-size
substantially but
that didn't make any difference.
Another option we're considering but haven't tried yet is
upgrading to
gluster 5.3 by back-porting the package from Ubuntu 19.04 to
18.04. Does
anyone think this might help?
Is there any particular debug logging we could set up or
other commands
we could run to troubleshoot this better? Any thoughts,
suggestions,
ideas would be greatly appreciated.
Thanks,
-Kartik
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
--