Morning all, I have a cluster of three RHEL5-x86_64 machines (all up to date) sharing a GFS filesystem on a Coraid AoE unit. Last night, I shut the whole thing down to replace batteries in a couple UPS units and bought things back up without issue. About an hour later, access to the shared filesystem stalled from all three machines. It was late so I figured I missed something so I brought it back down an up again and it was fine. About 4am this morning, it did it again. By the time I got to the site, people were already screaming so I simply restarted it again. I've had some time (and coffee) now to look through the logs and am finding little of value. I see two anomalies but I don't know what they mean. The first thing I found is a number of lines like so: openais[3953]: [TOTEM] Retransmit List: 3eb233 The second this is a set off messages like this: kernel: INFO: task nfsd:3523 blocked for more than 120 seconds. These are followed by stack dumps where dlm_lock is on top. Some searching suggests this may be an issue with my switch. Is that reasonable? Is there a way to get further diagnostics? This cluster has been in service for a couple years so I'm leaning toward something being broken instead of configured wrong. Any help would be appreciated. Paul -- Paul Dugas • 522 Black Canyon Park, Canton GA 30114 USA • Paul@xxxxxxxx • +1.404.932.1355 -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster