On Mon, Jan 17, 2005 at 05:31:33PM -0800, Daniel McNeil wrote: > My 3 node cluster ran tests for 53 hours before hitting a problem. > > > Node cl031 hit the 1st problem CMAN: killed by STARTTRANS or > NOMINATE. There is a DLM assert on cl031 also, but that is > after a whole bunch of debug output. The full logs are > here (http://developer.osdl.org/daniel/GFS/test.12jan2005/) > > Any ideas on what is going on? > > Here is simplified output (in the README file): > test started Jan Wed 12 17:18 > hung after Fri Jan 14 22:00 > > cl031 got an error in just under 53 hours. > ========================================== > Jan 14 22:00:38 cl031 kernel: CMAN: node cl031a has been removed from the cluster : No response to messages It's the usual thing. missing messages. patrick