Bugs item #1895893, was opened at 2008-02-18 09:44 Message generated for change (Settings changed) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1895893&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: KVM-60+ halts, when using SCSI Initial Comment: Host: Intel CPU, F7/x64, KVM-60+ from git (userspace: kvm-60-155-g4422f97, kernelspace: kvm-60-10207-g9ef1f35) When installing Windows XP guest on emulated SCSI disk, KVM lock ups. The Command sent to Qemu/KVM: /usr/local/bin/qemu-system-x86_64 -drive file=/vm/WindowsXP.qcow2,if=scsi,boot=on -m 128 -monitor tcp:localhost:4503,server,nowait -cdrom /isos/windows/WindowsXP-SP2-Home-Pro-Tablet.iso -boot d -name WindowsXP Reproducible: Sometimes. Symptons: -The image during XP setup looks halted/locked, and no progress over 12 hours. -kvm_stat shows zero KVM activity. -Host CPU is 100% busy. -Qemu doesn't responds to any commands (such as alt+f2). GNU Debugger shows: (gdb) bt #0 lsi_execute_script (s=0x2bed030) at ../cpu-all.h:848 #1 0x000000000048a2e9 in qcow_aio_write_cb (opaque=0x2c8a050, ret=0) at block-qcow2.c:947 #2 0x000000000041898f in qemu_aio_poll () at /root/git/kvm/qemu/block-raw-posix.c:318 #3 0x000000000040de3c in main_loop_wait (timeout=0) at /root/git/kvm/qemu/vl.c:7822 #4 0x00000000004fd81d in kvm_eat_signals (env=0x2b52400, timeout=0) at /root/git/kvm/qemu/qemu-kvm.c:204 #5 0x00000000004fd859 in kvm_main_loop_wait (env=0x2b52400, timeout=0) at /root/git/kvm/qemu/qemu-kvm.c:211 #6 0x00000000004fe0a6 in kvm_main_loop_cpu (env=0x2b52400) at /root/git/kvm/qemu/qemu-kvm.c:309 #7 0x0000000000410e3d in main (argc=<value optimized out>, argv=0x7fff06235728) at /root/git/kvm/qemu/vl.c:7856 ==================================================== Dmesg shows: apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 0 apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 0 ...looping forever. -Alexey "Technologov", 18.02.2008. ---------------------------------------------------------------------- >Comment By: Jes Sorensen (jessorensen) Date: 2010-06-19 23:10 Message: Hi, I verified with Marcelo (mtosatii) and the bug is supposed to be fixed. Since there has been no activity in this one for more than two years I assume that is the case. If it reappears please open a new bug in launchpad. Thanks, Jes ---------------------------------------------------------------------- Comment By: Alf Mel (alfmel) Date: 2008-04-28 22:11 Message: Logged In: YES user_id=1865908 Originator: NO OK. I've applied the matley patch and your debug patch to KVM 66. I've also been able to reproduce the problem on a raw SCSI disk while installing Windows 2003. You can find the log at: http://mel.byu.edu/kvm-scsi-debug.log.bz2 ---------------------------------------------------------------------- Comment By: Marcelo Tosatti (mtosatti) Date: 2008-04-27 01:45 Message: Logged In: YES user_id=2022487 Originator: NO Alexey, Alberto, I'm unable to reproduce the problem with the Linux driver. The Windows SCSI SCRIPTS is different so that might the reason. The state machine is relatively complex depending on this SCRIPTS code. Please try the following: 1 - Attempt to reproduce the problem with raw disk instead of qcow2. 2 - Apply matley's patch below, and on top of that, this debug patch: http://people.redhat.com/~mtosatti/lsi-debug-crash.patch And then run qemu-kvm as usual, but redirect stderr output to a file: # qemu-kvm options 2> log-scsi-crash.txt Once the crash happens, there should be a pattern that repeats in this output. With that information its easier to understand what is going on. Thanks. ---------------------------------------------------------------------- Comment By: Alf Mel (alfmel) Date: 2008-04-12 00:46 Message: Logged In: YES user_id=1865908 Originator: NO I've confirmed the problem with KVM-65 as well. I applied the patch but it didn't work; I still experienced lockups. I am trying to install Windows Server 2003 on a SCSI disk and the installation keeps locking up on different parts of the file copy process. I'm using qcow2 disk format. I tried using raw format and it would lock up consistently when formatting the disk. I have tried installing W2K3 at least a dozen times with the same lockups. As part of my configuration, I move the monitor to run on a telnet server. When the lockup occurs, I can't connect to the monitor via telnet. I am also experiencing boot problems with Grub on SCSI disks. I reported the problem on the mailing list: http://article.gmane.org/gmane.comp.emulators.kvm.devel/15884 I don't know if the problems are related. ---------------------------------------------------------------------- Comment By: lanconnected (lanconnected) Date: 2008-04-08 18:17 Message: Logged In: YES user_id=2041746 Originator: NO Applied proposed patch on kvm-65. Windows XP Pro can be installed on scsi disk and boots up, but hangs unpredictably during disk activity. SDL windows can't be closed, kvm can only be killed with kill -9. ---------------------------------------------------------------------- Comment By: Matteo Frigo (matley) Date: 2008-03-30 14:58 Message: Logged In: YES user_id=35769 Originator: NO The bug seems to have nothing to do with Windows. You can reproduce the bug in kvm-63 and kvm-64 by creating an empty qcow2 scsi disk and running ``dd if=/dev/sda of=/dev/null bs=1M'' in linux. The patch below seems to fix the problem (at least with linux, I haven't tried Windows). If I understand the AIO layer correctly, scsi_read_data() and scsi_write_data() can be called again before the bdrv_aio_read call returns. If this happens, the original code reissues the same request twice, which is incorrect. The patch increments the read/writer counters before invoking the AIO layer. diff -aur kvm-64.old/qemu/hw/scsi-disk.c kvm-64.new/qemu/hw/scsi-disk.c --- kvm-64.old/qemu/hw/scsi-disk.c 2008-03-26 08:49:35.000000000 -0400 +++ kvm-64.new/qemu/hw/scsi-disk.c 2008-03-30 08:37:25.000000000 -0400 @@ -196,12 +196,12 @@ n = SCSI_DMA_BUF_SIZE / 512; r->buf_len = n * 512; - r->aiocb = bdrv_aio_read(s->bdrv, r->sector, r->dma_buf, n, + r->sector += n; + r->sector_count -= n; + r->aiocb = bdrv_aio_read(s->bdrv, r->sector - n, r->dma_buf, n, scsi_read_complete, r); if (r->aiocb == NULL) scsi_command_complete(r, SENSE_HARDWARE_ERROR); - r->sector += n; - r->sector_count -= n; } static void scsi_write_complete(void * opaque, int ret) @@ -248,12 +248,12 @@ BADF("Data transfer already in progress\n"); n = r->buf_len / 512; if (n) { - r->aiocb = bdrv_aio_write(s->bdrv, r->sector, r->dma_buf, n, + r->sector += n; + r->sector_count -= n; + r->aiocb = bdrv_aio_write(s->bdrv, r->sector - n, r->dma_buf, n, scsi_write_complete, r); if (r->aiocb == NULL) scsi_command_complete(r, SENSE_HARDWARE_ERROR); - r->sector += n; - r->sector_count -= n; } else { /* Invoke completion routine to fetch data from host. */ scsi_write_complete(r, 0); ---------------------------------------------------------------------- Comment By: lanconnected (lanconnected) Date: 2008-03-20 20:23 Message: Logged In: YES user_id=2041746 Originator: NO Can confirm it on kvm-63, 100% reproducible, same symptoms. System can be installed and always boots in safe mode, but never boots in normal mode. ACPI/noACPI settings have no influance. ---------------------------------------------------------------------- Comment By: Technologov (technologov) Date: 2008-02-18 11:21 Message: Logged In: YES user_id=1839746 Originator: YES ps axu: alexeye 21429 84.2 4.1 296740 166712 pts/4 Rl+ 04:40 16:22 /usr/local/bin/qemu-system-x86_64 -drive file=/vm/WindowsXP.qcow2,if=scsi,boot=on -m 128 -monitor tcp:localhost:4503,server,nowait -cdrom /isos/windows/WindowsXP-SP2-Home-Pro-Tablet.iso -boot c -name WindowsXP-SCSI-manual -no-kvm Another symptom I forgot to mention: Qemu (both KVM and -no-kvm) cannot be killed by pressing "X" on the SDL window, only by doing ctrl+C on the console. Anyone knows what "Rl+" means in the "ps" command output? -Alexey "Technologov", 18.02.2008. ---------------------------------------------------------------------- Comment By: Technologov (technologov) Date: 2008-02-18 11:18 Message: Logged In: YES user_id=1839746 Originator: YES Well, the same problem is reproducible with Qemu (-no-kvm): Same symptoms. (gdb) bt #0 0x000000000048ea9d in cpu_physical_memory_rw (addr=72552, buf=0x7fff26397b70 "???\200", len=4, is_write=0) at /root/git/kvm/qemu/exec.c:2682 #1 0x000000000041b0db in lsi_execute_script (s=0x2bed030) at ../cpu-all.h:848 #2 0x000000000048a2e9 in qcow_aio_write_cb (opaque=0x2bcefa0, ret=0) at block-qcow2.c:947 #3 0x000000000041898f in qemu_aio_poll () at /root/git/kvm/qemu/block-raw-posix.c:318 #4 0x000000000040de3c in main_loop_wait (timeout=10) at /root/git/kvm/qemu/vl.c:7822 #5 0x0000000000410d97 in main (argc=<value optimized out>, argv=0x7fff2639c858) at /root/git/kvm/qemu/vl.c:7926 -Alexey "Technologov", 18.02.2008. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1895893&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html