Thanks for the feedback everyone.
This is a very lightly loaded system with just 3 users ATM and very
little going on across the network (just editing code files etc). The
problem occurred again yesterday. For about 10 minutes my KDE desktop
locked up in 20 second bursts and then the problem went away for the
rest of the day. During that time the desktop and server were idle for
98.5% and pings continued fine. A kconsole window doing an "ls /home"
every 5 seconds was locked up doing the ls. I had kconsole windows open
doing the pings, top's and ls'es and although I couldn't operate the
desktop (move virtual desktops etc) the ping and top windows were
updating fine. No error messages in /var/log/messages on both systems
and the sar stats showed nothing out of the ordinary.
I am pretty sure the Ethernet network is fine including cables, switches
Ethernet adapters etc. Pings are fine etc. It just appears that the
client programs get a huge (> 20 secs) delayed response to accesses to
/home every now and then which points to NFS issues. Most of the system
stats counters just give the amount of access, not the latency of an
access which is what I need to track down the problem as there are few
disk and network accesses going on.
As I said all has been fine on this system until about a month ago and
the only obvious changes are the Fedora updates so I wondered if anyone
new if there had been changes to the NFS stack recently and/or how to
log peak NFS latencies ?
Terry
On 26/09/2021 18:06, Roger Heflin wrote:
Make sure you have sar/sysstat enabled and changed to do 1 minute samples.
sar -d will show disk perf. If one of the disks "blips" at the
firmware level (working on a hard to read block maybe), the util% on
that device will be significantly higher than all other disks so will
stand out. Then you can look deeper at the smart data.
sar generically will show your cpu/system time and sar -n DEV will
show detailed network traffic, sar -n EDEV will show network errors.
With it set to 1 minute you should be able to detect most blips.
On Sun, Sep 26, 2021 at 10:26 AM Jamie Fargen <jamie@xxxxxxxxxxxxxx> wrote:
Are there network switches under your control? It sounds similar to what happens when MTU on the systems MTU do not match or one system MTU is set above the value on the switch ports.
Next time the issue occurs use ping with the do not fragment flag.
ex $ ping -m DO -s 8972 ip.address
This example should be the highest value to work in the case of MTU size 9000, there is 28 byte overhead for IPv4 packets.
Second, are you sure no one is attaching to the network and duplicating the MAC address of your NFS server or perhaps the system that is stalled? If the switches are manageable you would have to insure that the MAC addresses are being learned on the correct ports.
-Jamie
On Sun, Sep 26, 2021 at 10:24 AM Tom Horsley <horsley1953@xxxxxxxxx> wrote:
On Sun, 26 Sep 2021 10:26:19 -0300
George N. White III wrote:
If you have cron jobs that use a lot of network bandwidth it may work
fine until some network issue causing lots of retransmits bogs it down.
Which is why you should check the dumb stuff first! Has a critter
chewed on the ethernet cable to the server?
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure