On Mon, Apr 09, 2018 at 10:25:46AM -0700, Liu Bo wrote: > > (e) there're errors about reading this bitmap(group 8383) shown in the log, > > crash> grep group e4b.txt > > bd_group = 8383 > > > > however when it comes to BUG_ON(k >= max), reading this bitmap has > > been successful, and it is the inconsistence between ->bb_counters > > and the buddy bitmap that ends up with the crash, but if the buddy > > bitmap was regenerated, bb_counters should match with the buddy > > bitmap. What probably happened is that the page containing actual allocation bitmap was pushed out of memory due to memory pressure. However, the buddy bitmap was still cached in memory. That's actually quite possible since the buddy bitmap will often be referenced more frequently than the allocation bitmap (for example, while searching for free space of a specific size, and then having that block group skipped when it's not available). Since there was an I/O error reading the allocation bitmap, the buffer is not valid. So it's not surprising that the BUG_ON(k >= max) is getting triggered. It's of course not desirable. What should happen is that once we realize that the allocation bitmap can't be read, we should mark the block group as not being eligible for allocations via the EXT4_GROUP_INFO_BBITMAP_CORRUT_BIT, to avoid the BUG_ON from triggering. I'll put it on my TODO list. Or feel free to try your hand at making the change yourself, versus the latest upstream kernel, and send a proposed patch to the linux-ext4@xxxxxxxxxxxxxxx mailing list. Cheers, - Ted