On May 23, 2018, at 12:38 AM, Wang Shilong <wshilong@xxxxxxx> wrote: > Hello Andreas, > > Sorry for late reply, I've got some time to finish this: > > Andreas Dilger wrote: >> I think this looks pretty reasonable, but the replacement of ext4_error() >> on some of the error paths with ext4_warning() seems like there is a >> chance that an error might be "lost" if one of the underlying functions >> misses its call to ext4_mark_group_bitmap_corrupted()->save_error_info(). >> >> I looked at the patch and the previous one together, and it _looks_ like >> all of the error paths are handled at some point lower down in the stack, >> but this has the potential to break in the future if a change doesn't >> call ext4_mark_group_bitmap_corrupted() itself for some reason. >> >> One option would be instead of ext4_warning() (or dropping completely) >> the ext4_error() messages, use ext4_mark_group_bitmap_corrupted() and >> change that function to not print an error message if the inode bitmap >> or block bitmap is already flagged as corrupted. That gives us code >> safety (the higher-level error check/message will catch any unhandled >> errors from below), while also reducing console spam (duplicate errors >> will not be printed, and the "most specific" error will print once to >> the console). > > -> could you please more specific for this? I understood that we could > add the check in ext4_mark_group_bitmap_corrupted() that only call > ext4_error()/warning() if necessary, but I did not catch your other point, > you mean we should move ext4_mark_group_bitmap_corrupted() to higher-level > call? No, my point was to somehow ensure either ext4_mark_group_bitmap_corrupted() is called (marking the group corrupt and calling ext4_warning() at least once on that group), OR call ext4_error(). I think the 1/2 patch is totally OK, since it is just moving the ext4_mark_group_bitmap_corrupted() calls into ext4_group_locked_error(). I think there are a few things to do to improve this patch: - add a check in ext4_mark_group_bitmap_corrupted() if the bitmap is already marked corrupted then don't print anything at all - add a mount option like "bitmaps={error,warning}" to allow the user to decide if corrupt bitmaps should be considered a fatal error or not, and check this in ext4_mark_group_bitmap_corrupted() - rather than removing the ext4_error() calls in the higher code paths, add a check ext4_is_bitmap_corrupted() to check if the bitmap error flag for that group is set, and don't call ext4_error() if it is. That avoids hitting ext4_error(), but ensures that it will be called if the underlying code did not call ext4_mark_group_bitmap_corrupted() for some reason. Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP