Every so often, one of our servers will go into what I can only describe as an undefined state: it pings, but there's zero access - you can't ssh in, and if I go plug a keyboard and monitor into the server itself, you can see the monitor's live, it's not the "monitor turned off" color, but there is zero response to the keyboard. The upshot is that I wind up having to power cycle it. Well, it just happened again on one of our servers Friday evening, as I found this morning. Looking at the logs this morning, I see that sar last shows 10:20:01 PM all 34.38 0.00 8.29 0.00 0.00 57.33 On of my users dropped me an email at 22:45 that it was "off", and the last things I see in /var/log/messages are one of those annoying Feb 21 22:26:23 <server> kernel: INFO: task perl:20596 blocked for more than 120 seconds. Feb 21 22:26:23 <server> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. I also see Feb 21 22:26:23 <server> kernel: perl D ffffffff80158250 0 20596 20557 which, as I just found by googling perl NOTLD, means that this is in a kernel uninterruptable state In addition, in the stack trace, some nfs messages Feb 21 22:26:23 <server> kernel: [<ffffffff886b58d1>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd So, it *appears* to be either an NFS issue, or a NIC issue. The user's home directory server is CentOS running 6.5, and the server that hung is 5.10. Mount on the formerly hung server, su-d to his account shows merely nfs, so I'm guessing it's NFS3. Looking at lsmod and /var/log/dmesg, I see it's running the tg3 NIC driver. Anyone else seeing this, and if so, any thoughts on the matter? Note that I've had this on Penguins, which are all Supermicro, and they're using the igb NIC driver, but the one this past weekend is a Dell, so it's not just one system. mark _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos