On 04/10/2021 00:51, Roger Heflin wrote:
With 10 minute samples anything that happened gets averaged enough
that even the worst event is almost impossible to see.
Sar will report the same as date ie local time. And a 12:51 event
would be in the 13:00 sample (started at about 12:50 and ended at 1300).
What I do see is that during that window your io rate was about 2x
prior 10 minute windows. With the 1 minute data we would be able to
see if the disk was excessively busy. You average iops were about 10%
of the disk capacity.
I have debugged issues where the badly behaving IO was maxing out
everything for 10sec on/10 sec off, in the 1 minute data there
appeared to be nothing interesting to see (50% capacity), but it was
playing hell with the interactive apps since during the 10 sec on
window operations that the user was doing that were normally taking .5
sec were taking 1-2 seconds and so clearly slow for the users.
With the sample size (60sec) close to the event size (45sec) it should
be visible on 1 minute data, but less than clear on 10 minute data
(9.25 minutes to average it out and hide/mask it).
do "systemctl edit sysstat-collect.timer"
And add this to the file:
[Timer]
OnCalendar=*:00/1
That will change it to 1minutes.
if you do this:
#!/bin/bash
while [ true ] ; do
export hour=$(date +%H)
iostat -t -x 10 360 > filename.${hour}
done
that will give you 10 sec iostat data, and start a new file each hour,
and overwrite the hour files the next day.
Thanks Roger, I will set those up/running.
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure