Seems like this is causing problems with the cluster - getting this on 1 node just before cluster hangs. gfs_controld[]: retrieve_plocks: ckpt open error 12 gfs The only reference i can find when googling this to plock.c rv = saCkptCheckpointOpen(ckpt_handle, &name, NULL, SA_CKPT_CHECKPOINT_READ, 0, &h); if (rv == SA_AIS_ERR_TRY_AGAIN) { log_group(mg, "retrieve_plocks: ckpt open retry"); sleep(1); goto open_retry; } if (rv != SA_AIS_OK) { log_error("retrieve_plocks: ckpt open error %d %s", rv, mg->name); return; } Not quite sure what CkptCheckpoint is, but from seeing the code from ais, it seems to be some form of fault tolerance. Found a post about a possible bug in the sackptCheckpointOpen function: https://lists.linux-foundation.org/pipermail/openais/2006-September/008360.html Have just installed newer versions of cman, gfs-utils, openais and kmod-gfs, and upgraded kernel now, going to see if im still getting hangs. been running for a few hours now with node resets and IO bursts and seems to be behaving a little better. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster