On Apr 19, 2018, at 7:39 AM, Wang Shilong <wangshilong1991@xxxxxxxxx> wrote: > > From: Wang Shilong <wshilong@xxxxxxx> > > Once we hit bitmaps errors, Set corruppted bitmaps > error bit will prevent us further allocation. > > Don't RO in default will make filesystem still usable > and error state is recorded in superblock, next > time offline e2fsck will fix errors. > > Suggested-by: Andreas Dilger <adilger.kernel@xxxxxxxxx> > Signed-off-by: Wang Shilong <wshilong@xxxxxxx> > --- I think this looks pretty reasonable, but the replacement of ext4_error() on some of the error paths with ext4_warning() seems like there is a chance that an error might be "lost" if one of the underlying functions misses its call to ext4_mark_group_bitmap_corrupted()->save_error_info(). I looked at the patch and the previous one together, and it _looks_ like all of the error paths are handled at some point lower down in the stack, but this has the potential to break in the future if a change doesn't call ext4_mark_group_bitmap_corrupted() itself for some reason. One option would be instead of ext4_warning() (or dropping completely) the ext4_error() messages, use ext4_mark_group_bitmap_corrupted() and change that function to not print an error message if the inode bitmap or block bitmap is already flagged as corrupted. That gives us code safety (the higher-level error check/message will catch any unhandled errors from below), while also reducing console spam (duplicate errors will not be printed, and the "most specific" error will print once to the console). Thoughts? Also, it would be good to get some feedback from Ted and others whether changing these errors to warnings is OK as-is (we'll still get an e2fsck on restart, but no immediate filesystem abort if corruption is detected), or if there needs to be a mount option (== superblock flag) to determine this behaviour? IMHO, the warning-with-eventual-repair is reasonable, so long as we don't use these groups for any new allocations that might spread corruption further. However, I could see some people may want to flag any notice of corruption immediately for repair, in case this is just one sign of a larger problem that may not otherwise be detected. Also, do we need to track how many bitmaps are corrupted, and fail the whole filesystem once we have too many corruptions (i.e. filesystem has no free blocks in uncorrupted groups)? We are already reducing the free blocks count of the filesystem, so the user will get ENOSPC at this point, but it might be better to finally give up at some point. Cheers, Andreas > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index ed5b988d3f8e..0fa16de96d8a 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -2563,6 +2563,8 @@ void __ext4_grp_locked_error(const char *, unsigned int, > struct super_block *, ext4_group_t, > unsigned long, ext4_fsblk_t, unsigned int, > const char *, ...); > +void save_error_info(struct super_block *sb, const char *func, > + unsigned int line); > > #define EXT4_ERROR_INODE(inode, fmt, a...) \ > ext4_error_inode((inode), __func__, __LINE__, 0, (fmt), ## a) > @@ -2596,9 +2598,10 @@ void __ext4_grp_locked_error(const char *, unsigned int, > flags, fmt, ##__VA_ARGS__) > #define ext4_mark_group_bitmap_corrupted(sb, group, flags, fmt, ...) \ > do { \ > - __ext4_error(sb, __func__, __LINE__, \ > - fmt, ##__VA_ARGS__); \ > __ext4_mark_group_bitmap_corrupted(sb, group, flags); \ > + __ext4_warning(sb, __func__, __LINE__, \ > + fmt, ##__VA_ARGS__); \ > + save_error_info(sb, __func__, __LINE__); \ > } while (0) > > > @@ -2648,9 +2651,11 @@ do { \ > } while (0) > #define ext4_mark_group_bitmap_corrupted(sb, group, flags, fmt, ...) \ > do { \ > - no_printk(fmt, ##__VA_ARGS__); \ > - __ext4_error(sb, "", 0, " "); \ > __ext4_mark_group_bitmap_corrupted(sb, group, flags); \ > + no_printk(fmt, ##__VA_ARGS__); \ > + __ext4_warning(sb, __func__, __LINE__, \ > + fmt, ##__VA_ARGS__); \ > + save_error_info(sb, __func__, __LINE__); \ > } while (0) > > #endif > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 065bce671de4..39ce56d76dd4 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -3908,8 +3908,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb, > bitmap_bh = ext4_read_block_bitmap(sb, group); > if (IS_ERR(bitmap_bh)) { > err = PTR_ERR(bitmap_bh); > - ext4_error(sb, "Error %d reading block bitmap for %u", > - err, group); > return 0; > } > > @@ -4076,16 +4074,15 @@ void ext4_discard_preallocations(struct inode *inode) > err = ext4_mb_load_buddy_gfp(sb, group, &e4b, > GFP_NOFS|__GFP_NOFAIL); > if (err) { > - ext4_error(sb, "Error %d loading buddy information for %u", > - err, group); > + ext4_warning(sb, > + "Error %d loading buddy information for %u", > + err, group); > continue; > } > > bitmap_bh = ext4_read_block_bitmap(sb, group); > if (IS_ERR(bitmap_bh)) { > err = PTR_ERR(bitmap_bh); > - ext4_error(sb, "Error %d reading block bitmap for %u", > - err, group); > ext4_mb_unload_buddy(&e4b); > continue; > } > @@ -4339,8 +4336,9 @@ ext4_mb_discard_lg_preallocations(struct super_block *sb, > err = ext4_mb_load_buddy_gfp(sb, group, &e4b, > GFP_NOFS|__GFP_NOFAIL); > if (err) { > - ext4_error(sb, "Error %d loading buddy information for %u", > - err, group); > + ext4_warning(sb, > + "Error %d loading buddy information for %u", > + err, group); > continue; > } > ext4_lock_group(sb, group); > @@ -4820,11 +4818,8 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, > } > count_clusters = EXT4_NUM_B2C(sbi, count); > bitmap_bh = ext4_read_block_bitmap(sb, block_group); > - if (IS_ERR(bitmap_bh)) { > - err = PTR_ERR(bitmap_bh); > - bitmap_bh = NULL; > - goto error_return; > - } > + if (IS_ERR(bitmap_bh)) > + return; > gdp = ext4_get_group_desc(sb, block_group, &gd_bh); > if (!gdp) { > err = -EIO; > @@ -5002,11 +4997,8 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, > } > > bitmap_bh = ext4_read_block_bitmap(sb, block_group); > - if (IS_ERR(bitmap_bh)) { > - err = PTR_ERR(bitmap_bh); > - bitmap_bh = NULL; > - goto error_return; > - } > + if (IS_ERR(bitmap_bh)) > + return PTR_ERR(bitmap_bh); > > desc = ext4_get_group_desc(sb, block_group, &gd_bh); > if (!desc) { > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 96981df03385..3ceda68344ae 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -342,8 +342,8 @@ static void __save_error_info(struct super_block *sb, const char *func, > le32_add_cpu(&es->s_error_count, 1); > } > > -static void save_error_info(struct super_block *sb, const char *func, > - unsigned int line) > +void save_error_info(struct super_block *sb, const char *func, > + unsigned int line) > { > __save_error_info(sb, func, line); > ext4_commit_super(sb, 1); > @@ -743,7 +743,7 @@ __acquires(bitlock) > if (flags) > __ext4_mark_group_bitmap_corrupted(sb, grp, flags); > > - if (test_opt(sb, ERRORS_CONT)) { > + if (test_opt(sb, ERRORS_CONT) || flags) { > ext4_commit_super(sb, 0); > return; > } > -- > 2.14.3 > Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP