On Wed, 15 Aug 2012, Lukáš Czerner wrote: > Date: Wed, 15 Aug 2012 11:17:57 +0200 (CEST) > From: Lukáš Czerner <lczerner@xxxxxxxxxx> > To: Theodore Ts'o <tytso@xxxxxxx> > Cc: Lukas Czerner <lczerner@xxxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>, > "Linux Kernel Mailinlinux-ext4@xxxxxxxxxxxxxxxx List" > <linux-kernel@xxxxxxxxxxxxxxx>, linux-ext4@xxxxxxxxxxxxxxx > Subject: Re: ext4fs error > "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" > (with repro) > > On Thu, 9 Aug 2012, Theodore Ts'o wrote: > > > Date: Thu, 9 Aug 2012 13:06:40 -0400 > > From: Theodore Ts'o <tytso@xxxxxxx> > > To: Lukas Czerner <lczerner@xxxxxxxxxx> > > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>, > > "Linux Kernel Mailinlinux-ext4@xxxxxxxxxxxxxxxx List" > > <linux-kernel@xxxxxxxxxxxxxxx>, linux-ext4@xxxxxxxxxxxxxxx > > Subject: Re: ext4fs error > > "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" > > (with repro) > > > > On Thu, Aug 09, 2012 at 12:00:09PM +0200, Paolo Bonzini wrote: > > > Here is how to reproduce it. It happens during fstrim. I found other > > > occurrences of the error in the mailing list, but they were not related > > > to trim so they may be something different. > > > > > > modprobe scsi_debug dev_size_mb=256 lbpws=1 > > > dd if=/dev/zero of=/dev/sdb bs=1M > > > fdisk /dev/sdb > > > >> create a new partition accepting all defaults > > > fdisk -lu /dev/sdb|tail -1 > > > >> should show: /dev/sdb1 57 524285 262114+ 83 Linux > > > > > > mkfs.ext4 /dev/sdb1 > > > mkdir test > > > mount /dev/sdb1 test > > > fstrim ./test > > > > I can confirm that this accurately reproduces file system corruption > > using a 3.5 kernel. It looks like some block allocation bitmap blocks > > is getting trimmed when it shouldn't have been. Lukas, can you take a > > look at this? > > > > - Ted > > Hi Ted, > > sorry for the delay, I've just got back from my vacation. I'll take > a look at it. > > Thanks! > -Lukas This does not seem like an ext4 problem. The code seems unable to actually discard blocks which are allocated. Moreover I was not able to reproduce the problem on the loop device with the same setting as the reported scsi_debug device (1024 bs file system on the 256MB image residing on the 1024 bs filesystem) After a little bit of tracing with the systemtap and blktrace ext4 does not seem to be doing anything wrong and yet we get part of the block bitmap trimmed. This lead me to the scsi_debug driver itself and indeed it seems that we have off-by-one bug there in the unamp_region() which is causing the problem. Here is the patch which fixes the problem for me, I'll resend the proper patch in a bit. diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index 182d5a5..f4cc413 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -2054,7 +2054,7 @@ static void unmap_region(sector_t lba, unsigned int len) block = lba + alignment; rem = do_div(block, granularity); - if (rem == 0 && lba + granularity <= end && block < map_size) { + if (rem == 0 && lba + granularity < end && block < map_size) { clear_bit(block, map_storep); if (scsi_debug_lbprz) memset(fake_storep + Thanks! -Lukas