On 10/18/13 1:59 PM, Theodore Ts'o wrote: > On Fri, Oct 18, 2013 at 09:08:40AM -0500, Eric Sandeen wrote: >> On 10/17/13 8:28 PM, Theodore Ts'o wrote: >>> In the case of a storage device that suddenly disappears, or in the >>> case of significant file system corruption, this can result in a huge >>> flood of messages being sent to the console. This can overflow the >>> file system containing /var/log/messages, or if a serial console is >>> configured, this can slow down the system so much that a hardware >>> watchdog can end up triggering forcing a system reboot. >> >> Just out of curiosity, after the fs shuts down, is there still a flood >> of messages? Shouldn't that clamp down on the errors? > > Not if we are running with errors=continue. Maybe the ratelimit should depend on that then? I'm just concerned about the possibility of filtering messages that, rather than being a nuisance, are vital to figuring out what went wrong. (granted, it's probably the first error or two that matters) Or maybe it's only relevant with errors=continue, and errors=remount-ro will be self-limiting in any case. > There are some ugly > patches in our tree which pipes error notifications to a netlink > socket, which allows userspace to do something intelligent with > errors, and because there are some errors where it's safe to continue > (especially if you are willing to shut down block allocations to the > block group where you don't trust the allocation bitmap), we tend to > run with errors=continue. hm... :) > I think I mentioned the errors->netlink feature a while back, but > there wasn't a whole lot of excitement about it, and the patches > definitely need a lot of cleanup before they would be ready for > upstream merging. If people are curious, I can look into getting the > patches sent out, since we just finished rebasing them to 3.11. > >> If not, shouldn't it do so? xfs has a lot of short-circuiting if >> the filesystem is shut down, so it (I think) won't get into paths that >> will generate more errors. > > When xfs "shuts down" the file system, it doesn't allow any read or > write accesses, right? So it's basically an even stronger version of > errors=remount-ro. We should perhaps discuss whether it would be > better to squelch errors if we've remounted the file system read-only, > or whether we should implement a complete shutdown errors option. Yeah, there is no errors=continue type option, that is probably too dangerous in general for the majority of users. I'd guess that w/ default remount-ro, the error flood isn't a risk. > And of course, even if we did this, we would still need to squelch > ext4_warning and ext4_msg output. (Although I agree with Lukas that > it might not be a bad idea to review some of the messages that either > get emitted via printk, or which are issued via ext4_msg(KERN_CRIT) to > see if we should perhaps change some of those to ext4_error.) *nod* Thanks, -Eric > Regards, > > - Ted > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html