On Tue, Apr 15, 2014 at 01:24:26PM +0530, Amit Sahrawat wrote: > Initially in normal write path, when the disk was almost full – we got > hung for the ‘sync’ because the flusher (which is busy in the > writepages is not responding). Before the hung task, we also found the > logs like: > > EXT4-fs error (device sda1): ext4_mb_generate_buddy:742: group 1493, 0 > clusters in bitmap, 58339 in gd > EXT4-fs error (device sda1): ext4_mb_generate_buddy:742: group 1000, 0 > clusters in bitmap, 3 in gd > EXT4-fs error (device sda1): ext4_mb_generate_buddy:742: group 1425, 0 > clusters in bitmap, 1 in gd These errors indicate that the several block groups contain have corrupt block bitmap: block group #1493, #1000, and #1425. The fact that there are 0 free blocks/clusters in the bitmap indicate that the bitmap was all zero's, which could be the potential cause of the corruption. The other thing which is funny is the number of free blocks/clusters being greater than 32768 in block group #1493. Assuming a 4k page size, that shouldn't be possible. Can you send the output of "dumpe2fs -h /dev/sdXX" so we can take a look at the file system parameters? How much before the hung task did you see these messages? I normally recommend that the file system be set up to either panic the system, or force the file system to be remounted read/only when EXT4-fs error messages are reported, since that means that the file system is corrupted, and further operaion can cause more data to be lost. > JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 0). There's > a risk of filesystem corruption in case of system crash. > JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 0). There's > a risk of filesystem corruption in case of system crash. > > EXT4-fs (sda1): error count: 58 > EXT4-fs (sda1): initial error at 607: ext4_mb_generate_buddy:742 > EXT4-fs (sda1): last error at 58: ext4_mb_generate_buddy:742 The "607" and "58" in the "at 607" and "at 58" are normally supposed to be a unix time_t value. That is, it's normally a number like: 1397564866, and it can be decoded via: % date -d @1397564866 Tue Apr 15 08:27:46 EDT 2014 The fact that these numbers are numerically so small means that the time wasn't set correctly on your system. Was this a test system running under kvm without a proper real-time clock? > When we analysed the problem, it occurred from the writepages path in ext4. > This is because of the difference in the free blocks reported by > cluster bitmap and the number of free blocks reported by group > descriptor. Yes, indicating that the file system was corrupt. > During ext4_fill_super, ext4 calculates the number of free blocks by > reading all the descriptors in function ext4_count_free_clusters and > store it in percpu counter s_freeclusters_counter. > ext4_count_free_clusters: > desc_count = 0; > for (i = 0; i < ngroups; i++) { > gdp = ext4_get_group_desc(sb, i, NULL); > if (!gdp) > continue; > desc_count += ext4_free_group_clusters(sb, gdp); > } > return desc_count; > > During writebegin call, ext4 checks this s_freeclusters_counter > counter to know if there are free blocks present or not. > When the free blocks reported by group descriptor are greater than the > actual free blocks reported by bitmap, a call to writebegin could > still succeed even if the free blocks represented by bitmaps are 0. Yes. We used to have code that would optionally read every single bitmap, and verify that the descriptor counts match the values in the bitmap. However, that was expensive, and wasn't a full check of all possible file system inconsistencies that could lead to data loss. So we ultimately removed this code. If the file system is potentially corrupt, it is the system administrator's responsibility to force an fsck run to make sure the file system data structures are consistent. > When searching for the relevant problem which occurs in this path. We > got the patch-set from ‘Darrick’ which revolves around this problem. > ext4: error out if verifying the block bitmap fails > ext4: fix type declaration of ext4_validate_block_bitmap > ext4: mark block group as corrupt on block bitmap error > ext4: mark block group as corrupt on inode bitmap error > ext4: mark group corrupt on group descriptor checksum > ext4: don't count free clusters from a corrupt block group > > After adopting the patch-set and performing verification on the > similar setup, we ran ‘fsstress’. But now it is resulting in hang at > different points. > > In the current logs we got: > EXT4-fs error (device sdb1): ext4_mb_generate_buddy:743: group 1, > 20480 clusters in bitmap, 25443 in gd; block bitmap corrupt. > JBD2: Spotted dirty metadata buffer (dev = sdb1, blocknr = 0). There's > a risk of filesystem corruption in case of system crash. OK, what version of the kernel are you using? The patches that you reference above have been in the upstream kernel since 3.12, so I'm assuming you're not using the latest upstream kernel, but rather an older kernel with some patches applied. Hmmm, skipping ahead: > Kernel Version: 3.8 > Test command: > fsstress -p 10 -n 100 -l 100 -d /mnt/test_dir There is clearly either some kernel bug or hardware problem which is causing the file system corruption. Given that you are using a much older kernel, it's quite likely that there is some bug that has been fixed in a later version of the kernel (although we can't really rule out a hardware problem without know much more about your setup). Unfortunately, there has been a *large* number of changes since version 3.8, and I can't remember all of the changes and bug fixes that we might have made in the past year or more (v3.8 dates from March 2013). Something that might be helpful is for you to use xfstests. That's a much more thorough set of tests which we've been using so if you must use an antique version of the kernel, that will probably be a much better set of tests. It includes fsstress, and much more besides. More importantly, there are times when fixes are identified by the xfstest failure that has gotten fixed up in the commit logs. So that might help you find the bug fix that you need to backport. For your convenience, there is a simple test framework that makes it relatively easy to build and run xfstests under KVM. You can find it here: git://git.kernel.org/pub/scm/fs/ext2/xfstests-bld.git See the documentation found at: https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/README for more details. I hope this helps, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html