On Fri, Dec 03, 2010 at 12:55:15PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > We currently have a global error message buffer in cmn_err that is > protected by a spin lock that disables interrupts. Recently there > have been reports of NMI timeouts occurring when the console is > being flooded by SCSI error reports due to cmn_err() getting stuck > trying to print to the console while holding this lock (i.e. with > interrupts disabled). The NMI watchdog is seeing this CPU as > non-responding and so is triggering a panic. While the trigger for > the reported case is SCSI errors, pretty much anything that spams > the kernel log could cause this to occur. > > Realistically the only reason that we have the intemediate message > buffer is to prepend the correct kernel log level prefix to the log > message. The only reason we have the lock is to protect the global > message buffer and the only reason the message buffer is global is > to keep it off the stack. Hence if we can avoid needing a global > message buffer we avoid needing the lock, and we can do this with a > small amount of cleanup and some preprocessor tricks: > > 1. clean up xfs_cmn_err() panic mask functionality to avoid > needing debug code in xfs_cmn_err() > 2. remove the couple of "!" message prefixes that still exist that > the existing cmn_err() code steps over. > 3. redefine CE_* levels directly to KERN_* > 4. redefine cmn_err() and friends to use printk() directly > via variable argument length macros. > > By doing this, we can completely remove the cmn_err() code and the > lock that is causing the problems, and rely solely on printk() > serialisation to ensure that we don't get garbled messages. > > A series of followup patches is really needed to clean up all the > cmn_err() calls and related messages properly, but that results in a > series that is not easily back portable to enterprise kernels. Hence > this initial fix is only to address the direct problem in the lowest > impact way possible. FWIW, while these macros are the best way to make a simple backport is possible, I just discovered that mainline has a %pV format operator that allows an implementation like: void xfs_fs_cmn_err( const char *lvl, struct xfs_mount *mp, const char *fmt, ...) { struct va_format vaf; va_list args; va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; printk("%sFilesystem %s: %pV", lvl, mp->m_fsname, &vaf); va_end(args); BUG_ON(strncmp(lvl, KERN_EMERG, strlen(KERN_EMERG)) == 0); } Would this be a preferable method for replacing the existing implementations, or are the macros good enough as the first step of a mainline cleanup? Cheers,,, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs