Re: [PATCH] ext4: add ratelimiting to ext4 messages

Eric Sandeen <sandeen@xxxxxxxxxx> · Sat, 19 Oct 2013 18:04:55 -0500

On 10/18/13 1:59 PM, Theodore Ts'o wrote:
> On Fri, Oct 18, 2013 at 09:08:40AM -0500, Eric Sandeen wrote:
>> On 10/17/13 8:28 PM, Theodore Ts'o wrote:
>>> In the case of a storage device that suddenly disappears, or in the
>>> case of significant file system corruption, this can result in a huge
>>> flood of messages being sent to the console.  This can overflow the
>>> file system containing /var/log/messages, or if a serial console is
>>> configured, this can slow down the system so much that a hardware
>>> watchdog can end up triggering forcing a system reboot.
>>
>> Just out of curiosity, after the fs shuts down, is there still a flood
>> of messages?  Shouldn't that clamp down on the errors?
> 
> Not if we are running with errors=continue.

Maybe the ratelimit should depend on that then?  I'm just concerned about
the possibility of filtering messages that, rather than being a nuisance,
are vital to figuring out what went wrong.

(granted, it's probably the first error or two that matters)

Or maybe it's only relevant with errors=continue, and errors=remount-ro
will be self-limiting in any case.

> There are some ugly
> patches in our tree which pipes error notifications to a netlink
> socket, which allows userspace to do something intelligent with
> errors, and because there are some errors where it's safe to continue
> (especially if you are willing to shut down block allocations to the
> block group where you don't trust the allocation bitmap), we tend to
> run with errors=continue.

hm... :)

> I think I mentioned the errors->netlink feature a while back, but
> there wasn't a whole lot of excitement about it, and the patches
> definitely need a lot of cleanup before they would be ready for
> upstream merging.  If people are curious, I can look into getting the
> patches sent out, since we just finished rebasing them to 3.11.
> 
>> If not, shouldn't it do so?  xfs has a lot of short-circuiting if
>> the filesystem is shut down, so it (I think) won't get into paths that
>> will generate more errors.
> 
> When xfs "shuts down" the file system, it doesn't allow any read or
> write accesses, right?  So it's basically an even stronger version of
> errors=remount-ro.  We should perhaps discuss whether it would be
> better to squelch errors if we've remounted the file system read-only,
> or whether we should implement a complete shutdown errors option.

Yeah, there is no errors=continue type option, that is probably too
dangerous in general for the majority of users.

I'd guess that w/ default remount-ro, the error flood isn't a risk.

> And of course, even if we did this, we would still need to squelch
> ext4_warning and ext4_msg output.  (Although I agree with Lukas that
> it might not be a bad idea to review some of the messages that either
> get emitted via printk, or which are issued via ext4_msg(KERN_CRIT) to
> see if we should perhaps change some of those to ext4_error.)

*nod*

Thanks,
-Eric

> Regards,
> 
> 						- Ted
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html