Ok. Yes, I can see in the core dump that the code died after a call to qb_rb_chunk_alloc() from _blackbox_vlogger(). Turning off the blackbox logger has seemed to make the crashing go away. The system now appears stable, whereas before I'd have multiple crashes in just 5 minutes of operation. It seems to be dying when a new chunk is allocated to the ring buffer and a new chunk header is inserted.
Thanks for helping us get to a stable state. It's not a permanent fix but it greatly improves our situation. Please let me know if I can assist in reproducing this issue. Thanks again for your help.
- Rob P. On 5/12/2013 11:07 PM, Angus Salkeld wrote:
Thanks for checking that out. I am looking at this bug, I don't have 14 nodes to test on right now so I am trying other means to reproduce (multipling the log messages). Also I have a ubuntu 12.10 machine - trying with that. I'll let you know if I can reproduce here. If you want to temporarily disable the blackbox you can: exec/logsys.c: qb_log_ctl(QB_LOG_BLACKBOX, QB_LOG_CONF_ENABLED, QB_TRUE); Just change QB_TRUE to QB_FALSE. -Angus
-- Robert Parsons Chief Information Officer TAP Publishing Company 174 Fourth St Crossville, TN 38557-0509 931-484-5137 _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss