On Sun, 2012-07-01 at 23:14 +0100, Imran Chaudhry wrote: > Package: linux-2.6 > Version: 2.6.32-45 > Severity: normal > > Kernel bug observed in syslog when performing an rsync operation. I > use rsnapshot and I believe an rsnapshot operation "conflicted" or > "interfered" somehow with my manual rsync command. The source and > destination are USB HDDs with ext4 filesystems. After the kernel bug > was observed I discovered the source filesystem had a corrupt > filesystem. If it is relevant I was using the rsync command with > --hard-links and I also observed messages of this sort: > "[1075483.039915] EXT4-fs error (device sdb1): htree_dirblock_to_tree: > bad entry in directory #7143723: directory entry across blocks - > block=34323866offset=0(0), inode=135151872, rec_len=66180, > name_len=66" and "Jul 1 06:33:06 altair kernel: [1075335.376996] > EXT4-fs error (device sdb1): ext4_lookup: deleted inode referenced: > 8954048". Sorry to hear this. I cannot recommend using ext4 in Linux 2.6.32. > Relevant kernel log trace with bug: > Jul 1 05:37:53 altair kernel: [1072022.349172] ------------[ cut here ]------------ > Jul 1 05:37:53 altair kernel: [1072022.352027] kernel BUG at /build/buildd-linux-2.6_2.6.32-45-i386-yQfQSv/linux-2.6-2.6.32/debian/build/source_i386_none/fs/ext4/extents.c:1873! > Jul 1 05:37:53 altair kernel: [1072022.352027] invalid opcode: 0000 [#1] SMP > Jul 1 05:37:53 altair kernel: [1072022.352027] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:02:09.1/usb4/4-0:1.0/bInterfaceProtocol > Jul 1 05:37:53 altair kernel: [1072022.352027] Modules linked in: xt_multiport iptable_filter ip_tables x_tables fuse nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc ext4 jbd2 crc16 loop raid1 md_mod snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm i2c_i801 snd_timer shpchp snd psmouse evdev soundcore parport_pc parport serio_raw i2c_core snd_page_alloc pcspkr pci_hotplug rng_core processor button ext3 jbd mbcache usb_storage sd_mod crc_t10dif ata_generic ata_piix uhci_hcd e100 libata ehci_hcd thermal floppy r8169 mii usbcore nls_base scsi_mod thermal_sys [last unloaded: scsi_wait_scan] > Jul 1 05:37:53 altair kernel: [1072022.352027] > Jul 1 05:37:53 altair kernel: [1072022.352027] Pid: 31553, comm: rsync Not tainted (2.6.32-5-686 #1) Deskpro > Jul 1 05:37:53 altair kernel: [1072022.352027] EIP: 0060:[<e0ea5b00>] EFLAGS: 00010246 CPU: 0 > Jul 1 05:37:53 altair kernel: [1072022.352027] EIP is at ext4_ext_get_blocks+0x286/0x1916 [ext4] > Jul 1 05:37:53 altair kernel: [1072022.352027] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000 > Jul 1 05:37:53 altair kernel: [1072022.352027] ESI: 00000000 EDI: db1216f4 EBP: 00000000 ESP: dfad7ad0 [...] This specific failure mode seems to have been made possible by: commit 731eb1a03a8445cde2cb23ecfb3580c6fa7bb690 Author: Akinobu Mita <akinobu.mita@xxxxxxxxx> Date: Wed Mar 3 23:55:01 2010 -0500 ext4: consolidate in_range() definitions which was backported into a stable update. If the 'first' and 'len' arguments to in_range() are both 0 and either of them is unsigned, it wrongly returns true. This means that: if (in_range(iblock, ee_block, ee_len)) { ... ext4_ext_put_in_cache(inode, ee_block, ee_len, ee_start, EXT4_EXT_CACHE_EXTENT); may pass ee_len == 0 to ext4_ext_put_in_cache(), triggering the BUG_ON there. Maybe that's just not a valid case so this doesn't matter, but it seems like it might be possible with a corrupt filesystem? Anyway, I think the proper definition of in_range() is: #define in_range(b, first, len) ((b) >= (first) && ((b) - (first)) < (len)) Ben. -- Ben Hutchings 73.46% of all statistics are made up.
Attachment:
signature.asc
Description: This is a digitally signed message part