Re: Sleeping function called in invalid context

Andreas Dilger <adilger@xxxxxxxxx> · Thu, 11 Aug 2016 13:52:31 -0600

> On Aug 5, 2016, at 8:56 AM, Theodore Ts'o <tytso@xxxxxxx> wrote:
> 
> On Fri, Aug 05, 2016 at 09:29:59AM +0300, Nikolay Borisov wrote:
>>> The easist way to fix this is defer the ext4_commit_super() to a
>>> workqueue.  We only need this in the errors=continue case, and in that
>>> scenario we're not in a hurry when the superblock gets written out.
>> 
>> Is errors=continue the default option if nothing specifically is
>> specified at mount time, since I don't have this set explicitly:
>> 
>> /dev/vda / ext4 rw,relatime,data=ordered 0 0
> 
> Yes, it's the default.  I keep wondering whether we should change the
> default to remount-ro or even panic, since people sometimes don't
> notice that the "file system has been corrupted" messages, and then
> they can end up losing a lot more detail if we forced them to address
> the issue right away.

I'd definitely be in favour of making the default "errors=remount-ro".
We've been setting that explicitly for years, since otherwise people
may not notice their ongoing problems until the filesystem completely
explodes.

Related to that, there is a Lustre patch to handle inconsistencies
between group descriptors and block/inode bitmaps by marking only the
group as unusable for new allocations, instead of marking the whole
filesystem in error.  Is that something that is of interest to a wider
audience?

Patch against RHEL7 is attached, but could be updated for newer kernels
if there is interest.

Cheers, Andreas

Attachment:
ext4-corrupted-inode-block-bitmaps-handling-patches.patch

Description: Binary data
Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail