[ kvm-Bugs-1971512 ] failure to migrate guests with more than 4GB of RAM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bugs item #1971512, was opened at 2008-05-24 14:45
Message generated for change (Comment added) made by jiajun
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1971512&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
>Resolution: Fixed
Priority: 3
Private: No
Submitted By: Marcelo Tosatti (mtosatti)
Assigned to: Anthony Liguori (aliguori)
Summary: failure to migrate guests with more than 4GB of RAM

Initial Comment:

The migration code assumes linear "phys_ram_base":

[root@localhost kvm-userspace.tip]# qemu/x86_64-softmmu/qemu-system-x86_64 -hda /root/images/marcelo5-io-test.img -m 4097 -net nic,model=rtl8139 -net tap,script=/root/iptables/ifup -incoming tcp://0:4444/
audit_log_user_command(): Connection refused
audit_log_user_command(): Connection refused
migration: memory size mismatch: recv 22032384 mine 4316999680
migrate_incoming_fd failed (rc=232)


----------------------------------------------------------------------

>Comment By: Jiajun Xu (jiajun)
Date: 2009-08-20 23:38

Message:
We verified the bug with kvm.git: 779cc54dbccaa3a00d70a9d61d090be5d9ccc903
qemu.git: 9e3269181e9bc56feb43bcd4e8ce0b82cd543e65, the issue is fixed.

----------------------------------------------------------------------

Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 17:00

Message:
This has been fixed by Glauber.

----------------------------------------------------------------------

Comment By: Jiajun Xu (jiajun)
Date: 2008-12-15 18:37

Message:
We did not run anyworkload, we do migration just after guest boots up and
becomes idle.

----------------------------------------------------------------------

Comment By: Avi Kivity (avik)
Date: 2008-12-14 07:45

Message:
What workload is the guest running during the migration?

----------------------------------------------------------------------

Comment By: Jiajun Xu (jiajun)
Date: 2008-12-09 19:09

Message:
Open the bug again since Live Migration 4G guest still fail on my machine.
Guest will call trace after Live Migration.

----------------------------------------------------------------------

Comment By: SourceForge Robot (sf-robot)
Date: 2008-12-07 18:22

Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).

----------------------------------------------------------------------

Comment By: Jiajun Xu (jiajun)
Date: 2008-11-24 21:52

Message:
I tried latest commit, userspace.git
6e63ba19476753595e508713eb9daf559dc50bf6 with a 64-bit RHEL5.1 Guest. My
host kernel is 2.6.26.2. And My host has 8GB memory and 4GB swap.
Guest can be live migrated, but after that, guest will call trace.

Maybe we can have a check with each other's environment.

My steps as following:
1. qemu-system-x86_64 -incoming tcp:localhost:4444 -m 4096  -net
nic,macaddr=00:16:3e:44:1a:a6,model=rtl8139 -net
tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/rhel5u1.img
2. qemu-system-x86_64  -m 4096 -net
nic,macaddr=00:16:3e:44:1a:a6,model=rtl8139 -net
tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/rhel5u1.img
3. In qemu console, type "migrate tcp:localhost:4444"

The call trace messages in guest:
###################
Kernel BUG at block/elevator.c:560
invalid opcode: 0000 [1] SMP 
last sysfs file: /block/hda/removable
CPU 0 
Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc
iscsi_tcp
ib_iser libiscsi scsi_transport_iscsi rdma_ucm ib_ucm ib_srp ib_sdp
rdma_cm
ib_cm iw_cm ib_addr ib_local_sa ib_ipoib ib_sa ib_uverbs ib_umad ib_mad
ib_core
dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec i2c_core button
battery asus_acpi acpi_memhotplug ac lp floppy pcspkr serio_raw 8139cp
8139too
parport_pc parport mii ide_cd cdrom ata_piix libata sd_mod scsi_mod ext3
jbd
ehci_hcd ohci_hcd uhci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-53.el5 #1
RIP: 0010:[<ffffffff80134673>]  [<ffffffff80134673>]
elv_dequeue_request+0x8/0x3c
RSP: 0018:ffffffff8040ddc0  EFLAGS: 00010046
RAX: 0000000100000000 RBX: ffff81011381b398 RCX: 0000000000000000
RDX: ffff81011381b398 RSI: ffff81011381b398 RDI: ffff81011fb912c0
RBP: ffffffff804abe18 R08: ffffffff80304108 R09: 0000000000000012
R10: 0000000000000022 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000086 R15: ffffffff8040deb8
FS:  0000000000000000(0000) GS:ffffffff80396000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aaaaad6f4d0 CR3: 00000001126cc000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff803c6000, task
ffffffff802dcae0)
Stack:  ffffffff8000ae3c ffffffff804abe18 ffffffff804abe50
0000000000000000
 ffffffff804abd00 0000000000000246 ffffffff8003ba73 ffffffff8003ba0c
 ffffffff804abe18 ffff81011fbe5800 ffffffff8000d2a5 ffff81011fb8c5c0
Call Trace:
 <IRQ>  [<ffffffff8000ae3c>] ide_end_request+0xc6/0xfc
 [<ffffffff8003ba73>] ide_dma_intr+0x67/0xab
 [<ffffffff8003ba0c>] ide_dma_intr+0x0/0xab
 [<ffffffff8000d2a5>] ide_intr+0x16f/0x1df
 [<ffffffff800107a0>] handle_IRQ_event+0x29/0x58
 [<ffffffff800b5482>] __do_IRQ+0xa4/0x105
 [<ffffffff8006a3bd>] do_IRQ+0xe7/0xf5
 [<ffffffff8005b615>] ret_from_intr+0x0/0xa
 [<ffffffff80011ca9>] __do_softirq+0x53/0xd5
 [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006a53a>] do_softirq+0x2c/0x85
 [<ffffffff80068d0e>] default_idle+0x0/0x50
 [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80068d37>] default_idle+0x29/0x50
 [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
 [<ffffffff803d1806>] start_kernel+0x220/0x225
 [<ffffffff803d1237>] _sinittext+0x237/0x23e


Code: 0f 0b 68 25 50 29 80 c2 30 02 48 8b 46 08 48 89 42 08 48 89 
RIP  [<ffffffff80134673>] elv_dequeue_request+0x8/0x3c
 RSP <ffffffff8040ddc0>
 <0>Kernel panic - not syncing: Fatal exception
 BUG: warning at kernel/panic.c:137/panic() (Not tainted)

Call Trace:
 <IRQ>  [<ffffffff8008ccca>] panic+0x1e3/0x1f4
 [<ffffffff80196ae8>] do_unblank_screen+0x1b/0x132
 [<ffffffff800631aa>] oops_end+0x51/0x53
 [<ffffffff80069689>] die+0x3a/0x44
 [<ffffffff80069c37>] do_invalid_op+0xad/0xb7
 [<ffffffff80134673>] elv_dequeue_request+0x8/0x3c
 [<ffffffff80092dd4>] do_timer+0x2e8/0x53c
 [<ffffffff8006c0cc>] main_timer_handler+0x23d/0x3f4
 [<ffffffff8005bde9>] error_exit+0x0/0x84
 [<ffffffff80134673>] elv_dequeue_request+0x8/0x3c
 [<ffffffff8000ae3c>] ide_end_request+0xc6/0xfc
 [<ffffffff8003ba73>] ide_dma_intr+0x67/0xab
 [<ffffffff8003ba0c>] ide_dma_intr+0x0/0xab
 [<ffffffff8000d2a5>] ide_intr+0x16f/0x1df
 [<ffffffff800107a0>] handle_IRQ_event+0x29/0x58
 [<ffffffff800b5482>] __do_IRQ+0xa4/0x105
 [<ffffffff8006a3bd>] do_IRQ+0xe7/0xf5
 [<ffffffff8005b615>] ret_from_intr+0x0/0xa
 [<ffffffff80011ca9>] __do_softirq+0x53/0xd5
 [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006a53a>] do_softirq+0x2c/0x85
 [<ffffffff80068d0e>] default_idle+0x0/0x50
 [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80068d37>] default_idle+0x29/0x50
 [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
 [<ffffffff803d1806>] start_kernel+0x220/0x225
 [<ffffffff803d1237>] _sinittext+0x237/0x23e

BUG: warning at drivers/input/serio/i8042.c:846/i8042_panic_blink() (Not
tainted)

Call Trace:
 <IRQ>  [<ffffffff801ee9b8>] i8042_panic_blink+0x112/0x2a5
 [<ffffffff8008cc70>] panic+0x189/0x1f4
 [<ffffffff80196ae8>] do_unblank_screen+0x1b/0x132
 [<ffffffff800631aa>] oops_end+0x51/0x53
 [<ffffffff80069689>] die+0x3a/0x44
 [<ffffffff80069c37>] do_invalid_op+0xad/0xb7
 [<ffffffff80134673>] elv_dequeue_request+0x8/0x3c
 [<ffffffff80092dd4>] do_timer+0x2e8/0x53c
 [<ffffffff8006c0cc>] main_timer_handler+0x23d/0x3f4
 [<ffffffff8005bde9>] error_exit+0x0/0x84
 [<ffffffff80134673>] elv_dequeue_request+0x8/0x3c
 [<ffffffff8000ae3c>] ide_end_request+0xc6/0xfc
 [<ffffffff8003ba73>] ide_dma_intr+0x67/0xab
 [<ffffffff8003ba0c>] ide_dma_intr+0x0/0xab
 [<ffffffff8000d2a5>] ide_intr+0x16f/0x1df
 [<ffffffff800107a0>] handle_IRQ_event+0x29/0x58
 [<ffffffff800b5482>] __do_IRQ+0xa4/0x105
 [<ffffffff8006a3bd>] do_IRQ+0xe7/0xf5
 [<ffffffff8005b615>] ret_from_intr+0x0/0xa
 [<ffffffff80011ca9>] __do_softirq+0x53/0xd5
 [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006a53a>] do_softirq+0x2c/0x85
 [<ffffffff80068d0e>] default_idle+0x0/0x50
 [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80068d37>] default_idle+0x29/0x50
 [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
 [<ffffffff803d1806>] start_kernel+0x220/0x225
 [<ffffffff803d1237>] _sinittext+0x237/0x23e

BUG: warning at drivers/input/serio/i8042.c:849/i8042_panic_blink() (Not
tainted)

Call Trace:
 <IRQ>  [<ffffffff801eeaa1>] i8042_panic_blink+0x1fb/0x2a5
 [<ffffffff8008cc70>] panic+0x189/0x1f4
 [<ffffffff80196ae8>] do_unblank_screen+0x1b/0x132
 [<ffffffff800631aa>] oops_end+0x51/0x53
 [<ffffffff80069689>] die+0x3a/0x44
 [<ffffffff80069c37>] do_invalid_op+0xad/0xb7
 [<ffffffff80134673>] elv_dequeue_request+0x8/0x3c
 [<ffffffff80092dd4>] do_timer+0x2e8/0x53c
 [<ffffffff8006c0cc>] main_timer_handler+0x23d/0x3f4
 [<ffffffff8005bde9>] error_exit+0x0/0x84
 [<ffffffff80134673>] elv_dequeue_request+0x8/0x3c
 [<ffffffff8000ae3c>] ide_end_request+0xc6/0xfc
 [<ffffffff8003ba73>] ide_dma_intr+0x67/0xab
 [<ffffffff8003ba0c>] ide_dma_intr+0x0/0xab
 [<ffffffff8000d2a5>] ide_intr+0x16f/0x1df
 [<ffffffff800107a0>] handle_IRQ_event+0x29/0x58
 [<ffffffff800b5482>] __do_IRQ+0xa4/0x105
 [<ffffffff8006a3bd>] do_IRQ+0xe7/0xf5
 [<ffffffff8005b615>] ret_from_intr+0x0/0xa
 [<ffffffff80011ca9>] __do_softirq+0x53/0xd5
 [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006a53a>] do_softirq+0x2c/0x85
 [<ffffffff80068d0e>] default_idle+0x0/0x50
 [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80068d37>] default_idle+0x29/0x50
 [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
 [<ffffffff803d1806>] start_kernel+0x220/0x225
 [<ffffffff803d1237>] _sinittext+0x237/0x23e

BUG: warning at drivers/input/serio/i8042.c:851/i8042_panic_blink() (Not
tainted)

Call Trace:
 <IRQ>  [<ffffffff801eeb1e>] i8042_panic_blink+0x278/0x2a5
 [<ffffffff8008cc70>] panic+0x189/0x1f4
 [<ffffffff80196ae8>] do_unblank_screen+0x1b/0x132
 [<ffffffff800631aa>] oops_end+0x51/0x53
 [<ffffffff80069689>] die+0x3a/0x44
 [<ffffffff80069c37>] do_invalid_op+0xad/0xb7
 [<ffffffff80134673>] elv_dequeue_request+0x8/0x3c
 [<ffffffff80092dd4>] do_timer+0x2e8/0x53c
 [<ffffffff8006c0cc>] main_timer_handler+0x23d/0x3f4
 [<ffffffff8005bde9>] error_exit+0x0/0x84
 [<ffffffff80134673>] elv_dequeue_request+0x8/0x3c
 [<ffffffff8000ae3c>] ide_end_request+0xc6/0xfc
 [<ffffffff8003ba73>] ide_dma_intr+0x67/0xab
 [<ffffffff8003ba0c>] ide_dma_intr+0x0/0xab
 [<ffffffff8000d2a5>] ide_intr+0x16f/0x1df
 [<ffffffff800107a0>] handle_IRQ_event+0x29/0x58
 [<ffffffff800b5482>] __do_IRQ+0xa4/0x105
 [<ffffffff8006a3bd>] do_IRQ+0xe7/0xf5
 [<ffffffff8005b615>] ret_from_intr+0x0/0xa
 [<ffffffff80011ca9>] __do_softirq+0x53/0xd5
 [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006a53a>] do_softirq+0x2c/0x85
 [<ffffffff80068d0e>] default_idle+0x0/0x50
 [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80068d37>] default_idle+0x29/0x50
 [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
 [<ffffffff803d1806>] start_kernel+0x220/0x225
 [<ffffffff803d1237>] _sinittext+0x237/0x23e
###################


----------------------------------------------------------------------

Comment By: Avi Kivity (avik)
Date: 2008-11-23 11:18

Message:
Fixed by new migration protocol from upstream qemu.

----------------------------------------------------------------------

Comment By: Anthony Liguori (aliguori)
Date: 2008-05-26 09:48

Message:
Logged In: YES 
user_id=120449
Originator: NO

The issue isn't actually the use of phys_ram_base.  In the case of
migration, we don't care about the layout of physical memory.  We just want
to look at memory from phys_ram_base .. ram_size.

The problem is that we encode physical addresses in the migration protocol
as 32-bit values.  We'll need to figure out a way to switch to encoding
PFNs while maintaining backwards compatibility with the current code.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1971512&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux