Hi, On Sat 20-08-11 01:51:49, Thilo-Alexander Ginkel wrote: > while rsyncing a large amount (> 1TB) of data from an ext3 to an ext4 > on my machine [1], I encountered an issue where rsync and syslog > eventually started consuming 100% CPU and my syslog was flooded [2] > with error messages: > > -- 8< -- > > kernel: [101543.047293] b_state=0x00000029, b_size=>[10ock01543.04>[101543.047321] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025 > > kernel: [101543.047330] b_state=0x00000029, b_size=4096 > > kernel: [101543.047>[10ock01543.047348] b_state=0x00000029, b_size=4096 > > kernel: [101543.047353] device blocksize: 4096 > > kernel: [101543.047359] __find_get_block_slow() failed. block=328204473, b01543.0>[10ock01543.047>[1ock01543.047404] b_state=0x00000029, b_size=4096 > > kernel: [101543.047409] device blocksize: 4096 > > kernel: [101543.047414] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025 > > kernel: [10154ock01543.0>[1ock01543.0492>[1ock01543.0492>[1ock01543.049>[1ock01543.0492>[1ock01543.0>[1ock01543.049>[1ock01543.049>[1ock01543.0492>[10ock01543.0>[1ock=01543.04>[1ock01543.>[1ock01543.0493>[1ock01543.049>[1ock01543.04>[1ock01543.0493>[1ock01543.04941>[1ock01543.0494>[1ock01543.0>[1ock01543.049>[10ock01543.0>[1ock01543.04>[1ock01543.04>[1ock01543.0495>[1ock01543.0495>[1ock01543.0495>[1ock01543.0496>[1ock01543.04>[1ock01543.04>[1ock01543.049>[1ock01543.049>[1ock01543.04>[1ock01543.0497>[1ock01543.0>[1ock01543.0497>[1ock01543.0497>[1ock01543.0498>[1ock01543.0498>[1ock01543.04>[1ock01543.04>[1ock01543.0498>[1ock01543.0498>[1ock01543.0499>[1ock01543.0499>[1ock01543.04>[101543.049967] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025 > > kernel: [101543.049975] b_state=0x00000029, b_size=4096 > > kernel: [101543.049980] device blocksize: 4096 > > kernel: [101543.049986] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025 > -- 8< -- > > These are not preceded by any other error messages (about possible FS > inconsistencies) as has been the case in the past when bugs related to > this error message were reported. > > Judging by the block size, the possibly corrupt volume is the ext3 one > (the ext4 volume has a block size of 2048). > > A forced fsck.ext{3,4} of the source and target partitions did not > show any inconsistencies. > > Any ideas? Something has corrupted your buffer head structure in memory (and we then infinitely looped in __getblk_slow()). bh->b_blocknr has been 0xC139000B9 which it should have been 0x139000B9 (5th byte has been changed from 0x00 to 0x0C). It might be a hw fault, buggy driver, or some other bug - hard to say. You might want to run memtest for some time, or enable some kernel debug options (DEBUG_PAGEALLOC, DEBUG_SLAB) which might catch the code causing corruption (this assumes it's at least occasionally reproducible and your are willing to take the performance hit)... Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html