Bugs item #2933400, was opened at 2010-01-16 15:35 Message generated for change (Comment added) made by grisia You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2933400&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 9 Private: No Submitted By: MaSc82 (masc82) Assigned to: Nobody/Anonymous (nobody) Summary: virtio-blk io errors / data corruption on raw drives > 1 TB Initial Comment: When attaching raw drives > 1 TB, buffer io errors will most likely occur, filesystems get corrupted. Processes (like mkfs.ext4) might freeze completely when filesystems are created on the guest. Here's a typical log excerpt when using mkfs.ext4 on a 1.5 TB drive on a Ubuntu 9.10 guest: (kern.log) Jan 15 20:40:44 q kernel: [ 677.076602] Buffer I/O error on device vde, logical block 366283764 Jan 15 20:40:44 q kernel: [ 677.076607] Buffer I/O error on device vde, logical block 366283765 Jan 15 20:40:44 q kernel: [ 677.076611] Buffer I/O error on device vde, logical block 366283766 Jan 15 20:40:44 q kernel: [ 677.076616] Buffer I/O error on device vde, logical block 366283767 Jan 15 20:40:44 q kernel: [ 677.076621] Buffer I/O error on device vde, logical block 366283768 Jan 15 20:40:44 q kernel: [ 677.076626] Buffer I/O error on device vde, logical block 366283769 (messages) Jan 15 20:40:44 q kernel: [ 677.076534] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076541] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076546] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076599] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076604] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076609] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076613] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076618] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076623] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076628] lost page write due to I/O error on vde Jan 15 20:45:55 q Backgrounding to notify hosts... (The following entries will repeat infinitely, mkfs.ext4 will not exit and cannot be killed) Jan 15 20:49:27 q kernel: [ 1200.520096] mkfs.ext4 D 0000000000000000 0 1839 1709 0x00000000 Jan 15 20:49:27 q kernel: [ 1200.520101] ffff88004e157cb8 0000000000000082 ffff88004e157c58 0000000000015880 Jan 15 20:49:27 q kernel: [ 1200.520115] ffff88004ef6c7c0 0000000000015880 0000000000015880 0000000000015880 Jan 15 20:49:27 q kernel: [ 1200.520118] 0000000000015880 ffff88004ef6c7c0 0000000000015880 0000000000015880 Jan 15 20:49:27 q kernel: [ 1200.520123] Call Trace: Jan 15 20:49:27 q kernel: [ 1200.520157] [<ffffffff810da0f0>] ? sync_page+0x0/0x50 Jan 15 20:49:27 q kernel: [ 1200.520178] [<ffffffff815255f8>] io_schedule+0x28/0x40 Jan 15 20:49:27 q kernel: [ 1200.520182] [<ffffffff810da12d>] sync_page+0x3d/0x50 Jan 15 20:49:27 q kernel: [ 1200.520185] [<ffffffff81525b17>] __wait_on_bit+0x57/0x80 Jan 15 20:49:27 q kernel: [ 1200.520192] [<ffffffff810da29e>] wait_on_page_bit+0x6e/0x80 Jan 15 20:49:27 q kernel: [ 1200.520205] [<ffffffff81078650>] ? wake_bit_function+0x0/0x40 Jan 15 20:49:27 q kernel: [ 1200.520210] [<ffffffff810e44e0>] ? pagevec_lookup_tag+0x20/0x30 Jan 15 20:49:27 q kernel: [ 1200.520213] [<ffffffff810da745>] wait_on_page_writeback_range+0xf5/0x190 Jan 15 20:49:27 q kernel: [ 1200.520217] [<ffffffff810da807>] filemap_fdatawait+0x27/0x30 Jan 15 20:49:27 q kernel: [ 1200.520220] [<ffffffff810dacb4>] filemap_write_and_wait+0x44/0x50 Jan 15 20:49:27 q kernel: [ 1200.520235] [<ffffffff8114ba9f>] __sync_blockdev+0x1f/0x40 Jan 15 20:49:27 q kernel: [ 1200.520239] [<ffffffff8114bace>] sync_blockdev+0xe/0x10 Jan 15 20:49:27 q kernel: [ 1200.520241] [<ffffffff8114baea>] block_fsync+0x1a/0x20 Jan 15 20:49:27 q kernel: [ 1200.520249] [<ffffffff81142f26>] vfs_fsync+0x86/0xf0 Jan 15 20:49:27 q kernel: [ 1200.520252] [<ffffffff81142fc9>] do_fsync+0x39/0x60 Jan 15 20:49:27 q kernel: [ 1200.520255] [<ffffffff8114301b>] sys_fsync+0xb/0x10 Jan 15 20:49:27 q kernel: [ 1200.520271] [<ffffffff81011fc2>] system_call_fastpath+0x16/0x1b In my case I was switching to virtio at one point, but the behaviour didn't show until there was > 1 TB data on the filesystem. very dangerous. Tested using 2 different SATA controllers, 1.5 TB lvm/mdraid, single 1.5 TB drive and 2 TB lvm/mdraid. The behaviour does not occur with if=scsi or if=ide. #2914397 might be related: https://sourceforge.net/tracker/?func=detail&aid=2914397&group_id=180599&atid=893831 This blog post might also relate: http://www.neuhalfen.name/2009/08/05/OpenSolaris_KVM_and_large_IDE_drives/ CPU: Intel Xeon E5430 KVM: qemu-kvm-0.12.1.2 Kernel: 2.6.32.2, x86_64 Guest OS: Verified to occur on guests Ubuntu Linux 9.10 (64-bit) and Gentoo Linux (64-bit) Commandline (atm using ide instead of virtio for large drives as a workaround): qemu-system-x86_64 -S -M pc-0.11 -enable-kvm -m 1500 -smp 2 -name q -uuid 4684a449-0294-6a87-24a0-1f6d7c6e3eba -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/q.monitor,server,nowait -monitor chardev:monitor -boot c -drive cache=writeback,file=/dev/media1/q,if=virtio,index=0,boot=on -drive cache=writeback,file=/dev/media1/q_storage1,if=virtio,index=1 -drive cache=writeback,file=/dev/media2/q_storage2,if=ide,index=0 -drive cache=writeback,file=/dev/media3/q_storage3,if=virtio,index=2 -drive cache=writeback,file=/dev/media4/q_storage4,if=ide,index=1 -drive cache=writeback,file=/dev/media5/q_storage5,if=ide,index=2 -net nic,macaddr=52:54:00:12:34:59,vlan=0,model=virtio,name=virtio.0 -net tap,fd=18,vlan=0,name=tap.0 -serial none -parallel none -usb -usbdevice tablet -vnc 0.0.0.0:2 -k de -vga cirrus ---------------------------------------------------------------------- Comment By: Sebastian (grisia) Date: 2010-01-30 15:03 Message: Same behavior here.... Using QEMU-KVM version 0.11.1 error dosn't appears. Host (Gentoo): Vanilla Kernel Sources 2.6.32.7 QEMU-KVM 0.12.2 Guest image is a LVM2 partition (~30Gb) Guest is using virtio-blk Guest (Gentoo): Vanilla Kernel Sources 2.6.32.7 Filesystem is Ext4 Log after copying lot of files: Jan 30 14:06:02 jboss kernel: [ 40.177314] end_request: I/O error, dev vdd, sector 33555759 Jan 30 14:06:02 jboss kernel: [ 40.177314] Buffer I/O error on device vdd1, logical block 4194462 Jan 30 14:06:02 jboss kernel: [ 40.177314] lost page write due to I/O error on vdd1 ...snip... Jan 30 14:06:02 jboss kernel: [ 40.177314] Buffer I/O error on device vdd1, logical block 4194471 Jan 30 14:06:02 jboss kernel: [ 40.177314] lost page write due to I/O error on vdd1 Jan 30 14:06:02 jboss kernel: [ 40.177314] end_request: I/O error, dev vdd, sector 33556767 Jan 30 14:06:02 jboss kernel: [ 40.177314] end_request: I/O error, dev vdd, sector 37748799 Jan 30 14:06:02 jboss kernel: [ 40.177314] end_request: I/O error, dev vdd, sector 37749055 Jan 30 14:06:02 jboss kernel: [ 40.177314] end_request: I/O error, dev vdd, sector 41943103 Jan 30 14:06:02 jboss kernel: [ 40.177314] end_request: I/O error, dev vdd, sector 41943231 ...snip... Jan 30 14:06:02 jboss kernel: [ 40.535373] end_request: I/O error, dev vdd, sector 58734687 Jan 30 14:06:02 jboss kernel: [ 40.535421] end_request: I/O error, dev vdd, sector 58735695 Jan 30 14:06:02 jboss kernel: [ 40.535476] JBD: recovery failed Jan 30 14:06:02 jboss kernel: [ 40.535478] EXT4-fs (vdd1): error loading journal Jan 30 14:06:26 jboss kernel: [ 63.811349] EXT4-fs (vdd1): mounted filesystem with ordered data mode Jan 30 14:06:32 jboss kernel: [ 70.131957] EXT4-fs (vdc5): mounted filesystem with ordered data mode Jan 30 14:06:48 jboss kernel: [ 85.852773] JBD: barrier-based sync failed on vdd1-8 - disabling barriers Jan 30 14:07:08 jboss kernel: [ 106.391820] end_request: I/O error, dev vdd, sector 21283503 Jan 30 14:07:08 jboss kernel: [ 106.391823] __ratelimit: 12124 callbacks suppressed Jan 30 14:07:08 jboss kernel: [ 106.391826] Buffer I/O error on device vdd1, logical block 2660430 Jan 30 14:07:08 jboss kernel: [ 106.391828] lost page write due to I/O error on vdd1 Jan 30 14:07:08 jboss kernel: [ 106.391830] Buffer I/O error on device vdd1, logical block 2660431 Jan 30 14:07:08 jboss kernel: [ 106.391832] lost page write due to I/O error on vdd1 Jan 30 14:07:08 jboss kernel: [ 106.391834] Buffer I/O error on device vdd1, logical block 2660432 Jan 30 14:07:08 jboss kernel: [ 106.391835] lost page write due to I/O error on vdd1 ...snip... Jan 30 14:07:19 jboss kernel: [ 116.855081] end_request: I/O error, dev vdd, sector 38055999 Jan 30 14:07:19 jboss kernel: [ 116.855146] end_request: I/O error, dev vdd, sector 38056983 Jan 30 14:07:19 jboss kernel: [ 116.855207] end_request: I/O error, dev vdd, sector 38057935 Jan 30 14:07:19 jboss kernel: [ 116.855271] end_request: I/O error, dev vdd, sector 38058935 Jan 30 14:07:19 jboss kernel: [ 116.855339] end_request: I/O error, dev vdd, sector 38059943 Jan 30 14:07:19 jboss kernel: [ 116.855418] end_request: I/O error, dev vdd, sector 38060951 Jan 30 14:07:19 jboss kernel: [ 117.064320] JBD2: Detected IO errors while flushing file data on vdd1-8 Jan 30 14:07:19 jboss kernel: [ 117.064591] Aborting journal on device vdd1-8. Jan 30 14:07:19 jboss kernel: [ 117.064663] EXT4-fs error (device vdd1) in ext4_reserve_inode_write: Journal has aborted Jan 30 14:07:19 jboss kernel: [ 117.064921] EXT4-fs error (device vdd1) in ext4_dirty_inode: Journal has aborted Jan 30 14:07:19 jboss kernel: [ 117.065410] EXT4-fs error (device vdd1): ext4_journal_start_sb: Detected aborted journal Jan 30 14:07:19 jboss kernel: [ 117.065413] EXT4-fs (vdd1): Remounting filesystem read-only qemu -M pc -cpu host -name kvm.jboss -uuid 50259a3c-d6de-4c6c-811d-753f2df9658a -smp 1 -drive file=/dev/lg_virtual_machines/jboss,format=host_device,if=virtio,index=0,boot=on,cache=writeback -drive file=/dev/lg_virtual_machines_ext/jboss_swap,format=host_device,if=virtio,index=1,cache=writeback -drive file=/dev/lg_virtual_machines/yacy,format=host_device,if=virtio,index=2 -drive file=/dev/lg_virtual_machines_ext/yacy_backup,format=host_device,if=virtio,index=3 -net vde,vlan=0,name=jboss,sock=/var/run/vde2/vde_dmz -net nic,vlan=0,macaddr=54:52:02:77:5a:04,model=virtio,name=ifdmzjboss -k de -m 256 -parallel none -serial none -boot c -daemonize -vga std -nographic -balloon virtio -enable-kvm -pidfile /var/run/qemu/kvm.jboss.pid -vnc unix:/var/lib/qemu/vnc_displays/kvm.jboss.socket -monitor unix:/var/lib/qemu/monitors/kvm.jboss.socket,server,nowait ---------------------------------------------------------------------- Comment By: Avi Kivity (avik) Date: 2010-01-17 10:38 Message: 1 TB = 2^40 B = 2^31 sectors. Overflow? ---------------------------------------------------------------------- Comment By: MaSc82 (masc82) Date: 2010-01-16 15:59 Message: updated to prio9, since ppl moving from scsi/ide to virtio with existing filesystems will at one point experience corruption and data loss. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2933400&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html