Bugs item #2933400, was opened at 2010-01-16 15:35 Message generated for change (Comment added) made by masc82 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2933400&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 9 Private: No Submitted By: MaSc82 (masc82) Assigned to: Nobody/Anonymous (nobody) Summary: virtio-blk io errors / data corruption on raw drives > 1 TB Initial Comment: When attaching raw drives > 1 TB, buffer io errors will most likely occur, filesystems get corrupted. Processes (like mkfs.ext4) might freeze completely when filesystems are created on the guest. Here's a typical log excerpt when using mkfs.ext4 on a 1.5 TB drive on a Ubuntu 9.10 guest: (kern.log) Jan 15 20:40:44 q kernel: [ 677.076602] Buffer I/O error on device vde, logical block 366283764 Jan 15 20:40:44 q kernel: [ 677.076607] Buffer I/O error on device vde, logical block 366283765 Jan 15 20:40:44 q kernel: [ 677.076611] Buffer I/O error on device vde, logical block 366283766 Jan 15 20:40:44 q kernel: [ 677.076616] Buffer I/O error on device vde, logical block 366283767 Jan 15 20:40:44 q kernel: [ 677.076621] Buffer I/O error on device vde, logical block 366283768 Jan 15 20:40:44 q kernel: [ 677.076626] Buffer I/O error on device vde, logical block 366283769 (messages) Jan 15 20:40:44 q kernel: [ 677.076534] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076541] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076546] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076599] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076604] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076609] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076613] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076618] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076623] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076628] lost page write due to I/O error on vde Jan 15 20:45:55 q Backgrounding to notify hosts... (The following entries will repeat infinitely, mkfs.ext4 will not exit and cannot be killed) Jan 15 20:49:27 q kernel: [ 1200.520096] mkfs.ext4 D 0000000000000000 0 1839 1709 0x00000000 Jan 15 20:49:27 q kernel: [ 1200.520101] ffff88004e157cb8 0000000000000082 ffff88004e157c58 0000000000015880 Jan 15 20:49:27 q kernel: [ 1200.520115] ffff88004ef6c7c0 0000000000015880 0000000000015880 0000000000015880 Jan 15 20:49:27 q kernel: [ 1200.520118] 0000000000015880 ffff88004ef6c7c0 0000000000015880 0000000000015880 Jan 15 20:49:27 q kernel: [ 1200.520123] Call Trace: Jan 15 20:49:27 q kernel: [ 1200.520157] [<ffffffff810da0f0>] ? sync_page+0x0/0x50 Jan 15 20:49:27 q kernel: [ 1200.520178] [<ffffffff815255f8>] io_schedule+0x28/0x40 Jan 15 20:49:27 q kernel: [ 1200.520182] [<ffffffff810da12d>] sync_page+0x3d/0x50 Jan 15 20:49:27 q kernel: [ 1200.520185] [<ffffffff81525b17>] __wait_on_bit+0x57/0x80 Jan 15 20:49:27 q kernel: [ 1200.520192] [<ffffffff810da29e>] wait_on_page_bit+0x6e/0x80 Jan 15 20:49:27 q kernel: [ 1200.520205] [<ffffffff81078650>] ? wake_bit_function+0x0/0x40 Jan 15 20:49:27 q kernel: [ 1200.520210] [<ffffffff810e44e0>] ? pagevec_lookup_tag+0x20/0x30 Jan 15 20:49:27 q kernel: [ 1200.520213] [<ffffffff810da745>] wait_on_page_writeback_range+0xf5/0x190 Jan 15 20:49:27 q kernel: [ 1200.520217] [<ffffffff810da807>] filemap_fdatawait+0x27/0x30 Jan 15 20:49:27 q kernel: [ 1200.520220] [<ffffffff810dacb4>] filemap_write_and_wait+0x44/0x50 Jan 15 20:49:27 q kernel: [ 1200.520235] [<ffffffff8114ba9f>] __sync_blockdev+0x1f/0x40 Jan 15 20:49:27 q kernel: [ 1200.520239] [<ffffffff8114bace>] sync_blockdev+0xe/0x10 Jan 15 20:49:27 q kernel: [ 1200.520241] [<ffffffff8114baea>] block_fsync+0x1a/0x20 Jan 15 20:49:27 q kernel: [ 1200.520249] [<ffffffff81142f26>] vfs_fsync+0x86/0xf0 Jan 15 20:49:27 q kernel: [ 1200.520252] [<ffffffff81142fc9>] do_fsync+0x39/0x60 Jan 15 20:49:27 q kernel: [ 1200.520255] [<ffffffff8114301b>] sys_fsync+0xb/0x10 Jan 15 20:49:27 q kernel: [ 1200.520271] [<ffffffff81011fc2>] system_call_fastpath+0x16/0x1b In my case I was switching to virtio at one point, but the behaviour didn't show until there was > 1 TB data on the filesystem. very dangerous. Tested using 2 different SATA controllers, 1.5 TB lvm/mdraid, single 1.5 TB drive and 2 TB lvm/mdraid. The behaviour does not occur with if=scsi or if=ide. #2914397 might be related: https://sourceforge.net/tracker/?func=detail&aid=2914397&group_id=180599&atid=893831 This blog post might also relate: http://www.neuhalfen.name/2009/08/05/OpenSolaris_KVM_and_large_IDE_drives/ CPU: Intel Xeon E5430 KVM: qemu-kvm-0.12.1.2 Kernel: 2.6.32.2, x86_64 Guest OS: Verified to occur on guests Ubuntu Linux 9.10 (64-bit) and Gentoo Linux (64-bit) Commandline (atm using ide instead of virtio for large drives as a workaround): qemu-system-x86_64 -S -M pc-0.11 -enable-kvm -m 1500 -smp 2 -name q -uuid 4684a449-0294-6a87-24a0-1f6d7c6e3eba -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/q.monitor,server,nowait -monitor chardev:monitor -boot c -drive cache=writeback,file=/dev/media1/q,if=virtio,index=0,boot=on -drive cache=writeback,file=/dev/media1/q_storage1,if=virtio,index=1 -drive cache=writeback,file=/dev/media2/q_storage2,if=ide,index=0 -drive cache=writeback,file=/dev/media3/q_storage3,if=virtio,index=2 -drive cache=writeback,file=/dev/media4/q_storage4,if=ide,index=1 -drive cache=writeback,file=/dev/media5/q_storage5,if=ide,index=2 -net nic,macaddr=52:54:00:12:34:59,vlan=0,model=virtio,name=virtio.0 -net tap,fd=18,vlan=0,name=tap.0 -serial none -parallel none -usb -usbdevice tablet -vnc 0.0.0.0:2 -k de -vga cirrus ---------------------------------------------------------------------- >Comment By: MaSc82 (masc82) Date: 2010-03-06 10:11 Message: in 0.12.3 behaviour has been significantly improved, but seems it's not fixed. Tested with a 1.5 TB virtio raw drive, I created ext4 fs directly on the drive and rsynced data from a larger source to fill up the drive completely. Until the drive became full, everything looked good, no more buffer i/o errors or lost page writes as they were happening in 0.12.2. I was expecting rsync to abort at one point when disk became full, but it surprisingly continued quite a while, although df -h was already indicating there was no space left. Following messages began to show up in kernel.log. [45517.836999] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310728 [45517.847357] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310730 [45517.848044] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310731 [45517.853757] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310733 [45517.864262] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310734 [45517.935073] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310736 [45518.364535] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1441857 [45958.938625] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1441857 [45959.559208] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310736 [45959.562137] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310734 [45959.564764] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310733 [45959.568446] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310731 [45959.570541] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310730 [45959.580392] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310728 [46393.982995] EXT4-fs error (device vdd): ext4_mb_generate_buddy: EXT4-fs: group 8465: 28690 blocks in bitmap, 2047 in gd [46799.823463] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1441857 [46800.532939] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310736 [46800.543741] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310734 [46800.547021] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310733 [46800.548173] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310731 [46800.549303] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310730 [46800.561472] EXT4-fs error (device vdd): ext4_lookup: deleted inode referenced: 1310728 [47518.779879] EXT4-fs error (device vdd): ext4_mb_generate_buddy: EXT4-fs: group 8689: 28368 blocks in bitmap, 0 in gd [47686.658580] EXT4-fs error (device vdd): ext4_mb_generate_buddy: EXT4-fs: group 10822: 27802 blocks in bitmap, 1596 in gd [47689.313508] EXT4-fs error (device vdd): ext4_mb_generate_buddy: EXT4-fs: group 9212: 16198 blocks in bitmap, 1158 in gd [47697.030338] EXT4-fs error (device vdd): ext4_mb_generate_buddy: EXT4-fs: group 9473: 28686 blocks in bitmap, 1021 in gd [47726.639706] EXT4-fs error (device vdd): ext4_mb_generate_buddy: EXT4-fs: group 6825: 2047 blocks in bitmap, 1023 in gd [47782.529462] EXT4-fs error (device vdd): ext4_mb_generate_buddy: EXT4-fs: group 252: 358 blocks in bitmap, 102 in gd [47782.929821] EXT4-fs error (device vdd): ext4_mb_generate_buddy: EXT4-fs: group 378: 778 blocks in bitmap, 266 in gd [47782.929945] EXT4-fs error (device vdd): ext4_mb_generate_buddy: EXT4-fs: group 382: 1023 blocks in bitmap, 511 in gd [47783.073044] EXT4-fs error (device vdd): ext4_mb_generate_buddy: EXT4-fs: group 583: 683 blocks in bitmap, 171 in gd [47786.626901] EXT4-fs error (device vdd): ext4_mb_complex_scan_group: 18 free blocks as per group info. But got 82 blocks [47788.300039] EXT4-fs error (device vdd): ext4_mb_complex_scan_group: 18 free blocks as per group info. But got 82 blocks [47788.307045] EXT4-fs error (device vdd): ext4_mb_complex_scan_group: 18 free blocks as per group info. But got 82 blocks [47790.176599] EXT4-fs error (device vdd): ext4_mb_complex_scan_group: 18 free blocks as per group info. But got 82 blocks When rsync finally aborted (reporting no space left on device), the filesystem was corrupt and had to be repaired with fsck.ext4, which detected multiply-claimed blocks especially within most recently written files. ---------------------------------------------------------------------- Comment By: Sebastian (grisia) Date: 2010-01-30 15:03 Message: Same behavior here.... Using QEMU-KVM version 0.11.1 error dosn't appears. Host (Gentoo): Vanilla Kernel Sources 2.6.32.7 QEMU-KVM 0.12.2 Guest image is a LVM2 partition (~30Gb) Guest is using virtio-blk Guest (Gentoo): Vanilla Kernel Sources 2.6.32.7 Filesystem is Ext4 Log after copying lot of files: Jan 30 14:06:02 jboss kernel: [ 40.177314] end_request: I/O error, dev vdd, sector 33555759 Jan 30 14:06:02 jboss kernel: [ 40.177314] Buffer I/O error on device vdd1, logical block 4194462 Jan 30 14:06:02 jboss kernel: [ 40.177314] lost page write due to I/O error on vdd1 ...snip... Jan 30 14:06:02 jboss kernel: [ 40.177314] Buffer I/O error on device vdd1, logical block 4194471 Jan 30 14:06:02 jboss kernel: [ 40.177314] lost page write due to I/O error on vdd1 Jan 30 14:06:02 jboss kernel: [ 40.177314] end_request: I/O error, dev vdd, sector 33556767 Jan 30 14:06:02 jboss kernel: [ 40.177314] end_request: I/O error, dev vdd, sector 37748799 Jan 30 14:06:02 jboss kernel: [ 40.177314] end_request: I/O error, dev vdd, sector 37749055 Jan 30 14:06:02 jboss kernel: [ 40.177314] end_request: I/O error, dev vdd, sector 41943103 Jan 30 14:06:02 jboss kernel: [ 40.177314] end_request: I/O error, dev vdd, sector 41943231 ...snip... Jan 30 14:06:02 jboss kernel: [ 40.535373] end_request: I/O error, dev vdd, sector 58734687 Jan 30 14:06:02 jboss kernel: [ 40.535421] end_request: I/O error, dev vdd, sector 58735695 Jan 30 14:06:02 jboss kernel: [ 40.535476] JBD: recovery failed Jan 30 14:06:02 jboss kernel: [ 40.535478] EXT4-fs (vdd1): error loading journal Jan 30 14:06:26 jboss kernel: [ 63.811349] EXT4-fs (vdd1): mounted filesystem with ordered data mode Jan 30 14:06:32 jboss kernel: [ 70.131957] EXT4-fs (vdc5): mounted filesystem with ordered data mode Jan 30 14:06:48 jboss kernel: [ 85.852773] JBD: barrier-based sync failed on vdd1-8 - disabling barriers Jan 30 14:07:08 jboss kernel: [ 106.391820] end_request: I/O error, dev vdd, sector 21283503 Jan 30 14:07:08 jboss kernel: [ 106.391823] __ratelimit: 12124 callbacks suppressed Jan 30 14:07:08 jboss kernel: [ 106.391826] Buffer I/O error on device vdd1, logical block 2660430 Jan 30 14:07:08 jboss kernel: [ 106.391828] lost page write due to I/O error on vdd1 Jan 30 14:07:08 jboss kernel: [ 106.391830] Buffer I/O error on device vdd1, logical block 2660431 Jan 30 14:07:08 jboss kernel: [ 106.391832] lost page write due to I/O error on vdd1 Jan 30 14:07:08 jboss kernel: [ 106.391834] Buffer I/O error on device vdd1, logical block 2660432 Jan 30 14:07:08 jboss kernel: [ 106.391835] lost page write due to I/O error on vdd1 ...snip... Jan 30 14:07:19 jboss kernel: [ 116.855081] end_request: I/O error, dev vdd, sector 38055999 Jan 30 14:07:19 jboss kernel: [ 116.855146] end_request: I/O error, dev vdd, sector 38056983 Jan 30 14:07:19 jboss kernel: [ 116.855207] end_request: I/O error, dev vdd, sector 38057935 Jan 30 14:07:19 jboss kernel: [ 116.855271] end_request: I/O error, dev vdd, sector 38058935 Jan 30 14:07:19 jboss kernel: [ 116.855339] end_request: I/O error, dev vdd, sector 38059943 Jan 30 14:07:19 jboss kernel: [ 116.855418] end_request: I/O error, dev vdd, sector 38060951 Jan 30 14:07:19 jboss kernel: [ 117.064320] JBD2: Detected IO errors while flushing file data on vdd1-8 Jan 30 14:07:19 jboss kernel: [ 117.064591] Aborting journal on device vdd1-8. Jan 30 14:07:19 jboss kernel: [ 117.064663] EXT4-fs error (device vdd1) in ext4_reserve_inode_write: Journal has aborted Jan 30 14:07:19 jboss kernel: [ 117.064921] EXT4-fs error (device vdd1) in ext4_dirty_inode: Journal has aborted Jan 30 14:07:19 jboss kernel: [ 117.065410] EXT4-fs error (device vdd1): ext4_journal_start_sb: Detected aborted journal Jan 30 14:07:19 jboss kernel: [ 117.065413] EXT4-fs (vdd1): Remounting filesystem read-only qemu -M pc -cpu host -name kvm.jboss -uuid 50259a3c-d6de-4c6c-811d-753f2df9658a -smp 1 -drive file=/dev/lg_virtual_machines/jboss,format=host_device,if=virtio,index=0,boot=on,cache=writeback -drive file=/dev/lg_virtual_machines_ext/jboss_swap,format=host_device,if=virtio,index=1,cache=writeback -drive file=/dev/lg_virtual_machines/yacy,format=host_device,if=virtio,index=2 -drive file=/dev/lg_virtual_machines_ext/yacy_backup,format=host_device,if=virtio,index=3 -net vde,vlan=0,name=jboss,sock=/var/run/vde2/vde_dmz -net nic,vlan=0,macaddr=54:52:02:77:5a:04,model=virtio,name=ifdmzjboss -k de -m 256 -parallel none -serial none -boot c -daemonize -vga std -nographic -balloon virtio -enable-kvm -pidfile /var/run/qemu/kvm.jboss.pid -vnc unix:/var/lib/qemu/vnc_displays/kvm.jboss.socket -monitor unix:/var/lib/qemu/monitors/kvm.jboss.socket,server,nowait ---------------------------------------------------------------------- Comment By: Avi Kivity (avik) Date: 2010-01-17 10:38 Message: 1 TB = 2^40 B = 2^31 sectors. Overflow? ---------------------------------------------------------------------- Comment By: MaSc82 (masc82) Date: 2010-01-16 15:59 Message: updated to prio9, since ppl moving from scsi/ide to virtio with existing filesystems will at one point experience corruption and data loss. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2933400&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html