On Mon, Apr 9, 2018 at 4:44 PM, Theodore Y. Ts'o <tytso@xxxxxxx> wrote: > On Mon, Apr 09, 2018 at 10:25:46AM -0700, Liu Bo wrote: >> > (e) there're errors about reading this bitmap(group 8383) shown in the log, >> > crash> grep group e4b.txt >> > bd_group = 8383 >> > >> > however when it comes to BUG_ON(k >= max), reading this bitmap has >> > been successful, and it is the inconsistence between ->bb_counters >> > and the buddy bitmap that ends up with the crash, but if the buddy >> > bitmap was regenerated, bb_counters should match with the buddy >> > bitmap. > > What probably happened is that the page containing actual allocation > bitmap was pushed out of memory due to memory pressure. However, the > buddy bitmap was still cached in memory. That's actually quite > possible since the buddy bitmap will often be referenced more > frequently than the allocation bitmap (for example, while searching > for free space of a specific size, and then having that block group > skipped when it's not available). > > Since there was an I/O error reading the allocation bitmap, the buffer > is not valid. So it's not surprising that the BUG_ON(k >= max) is > getting triggered. > > It's of course not desirable. What should happen is that once we > realize that the allocation bitmap can't be read, we should mark the > block group as not being eligible for allocations via the > EXT4_GROUP_INFO_BBITMAP_CORRUT_BIT, to avoid the BUG_ON from > triggering. > Sounds good. > I'll put it on my TODO list. Or feel free to try your hand at making > the change yourself, versus the latest upstream kernel, and send a > proposed patch to the linux-ext4@xxxxxxxxxxxxxxx mailing list. > Sure, I'll try to make a patch. thanks, liubo