RE: [Question] virtio-serial misses irq delivery on migration?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I might catch the bug. Now, we rely on a vm_clock timer to assure that the vcpus are started
before we invoke send_control_event(),which queue a message in the virtio ring and trigger a irq.

In actually, we can't attach our initial intention under a special situation:

1. the process of migration destination is in a coroutine
 Commit 82a4da79fd6
2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
 if the Qemu encounter a EAGIN while reading QEMUFile.
 commit 595ab64169b

static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
                                 size_t size)
{
    QEMUFileSocket *s = opaque;
    ssize_t len;

    for (;;) {
        len = qemu_recv(s->fd, buf, size, 0);
        if (len != -1) {
            break;
        }
        if (socket_error() == EAGAIN) {
            yield_until_fd_readable(s->fd);
        } else if (socket_error() != EINTR) {
            break;
        }
    }

    if (len == -1) {
        len = -socket_error();
    }
    return len;
}

3. The main thread can get the cpu and timer will run, asumes that we get EAGIN
  after invoking fetch_active_ports_list().
4. reproduce the problem by fault injection.

The debug logs:

[2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities
[2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "xbzrle"}]}
[2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities
[2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "rdma-pin-all"}]}
[2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter process_incoming_migration_co()    // Enter corountine
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
[2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will init virtqueue
[2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: ************** begin to inject debug **********    // fault injection begin, migration coroutine yiled cpu
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1, current_time: 9750771061592    //vm_clock timer boom
[2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei: virtio_serial_post_load_timer_cb   //calling the timer callback
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 0 // send a message to the guest, raise the irq line.
[2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0
[2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 64056}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 0
[2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750770788478, current_time: 9750771061592
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state   //Enter migration coroutine again, and the restore the lapic register, the previous lapic's info will be covered :(
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
[2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started 
[2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start            //enter vm_start here
[2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that means DRIVER OK
[2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7 that means DRIVER OK
[2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that means DRIVER OK
[2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
[2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1, old: 1
[2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750772061592, current_time: 9750794269805
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 1
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750795284148, current_time: 9750795312685
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 1
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750796329656, current_time: 9750796369458
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750797396208, current_time: 9750797426752
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750798445707, current_time: 9750798477520
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750799496162, current_time: 9750799532183
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750800556817, current_time: 9750800588741
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750801606596, current_time: 9750801637961


The below is my debugging diff (Not based on the newest master branch)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 6a45af9..a00b02f 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
     if (!s->post_load) {
         return;
     }
+    QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
     for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
         port = s->post_load->connected[i].port;
         host_connected = s->post_load->connected[i].host_connected;
@@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
              */
             send_control_event(s, port->id, VIRTIO_CONSOLE_PORT_OPEN,
                                port->host_connected);
+            QEMU_LOG("gonglei: port->host_connected: %u\n", port->host_connected);
         }
         vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
         if (vsc->set_guest_connected) {
@@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int version_id,
         }
     }
     timer_mod(s->post_load->timer, 1);
+    QEMU_LOG("gonglei: ************** begin to inject debug **********\n");
+    yield_until_fd_readable(qemu_get_fd(f));
     return 0;
 }
 
@@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
     if (version_id > 3) {
         return -EINVAL;
     }
-
+    QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
     /* The virtio device */
     return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
 }
diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 5b47056..2d988cb 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct kvm_lapic_state *kapic)
 {
     APICCommonState *s = APIC_COMMON(dev);
     int i;
-
+    QEMU_LOG("gonglei: kvm_put_apic_state\n");
     memset(kapic, 0, sizeof(*kapic));
     kvm_apic_set_reg(kapic, 0x2, s->id << 24);
     kvm_apic_set_reg(kapic, 0x8, s->tpr);
diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
index d2a6c4c..07699ec 100644
--- a/hw/i386/kvm/ioapic.c
+++ b/hw/i386/kvm/ioapic.c
@@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
 
     chip.chip_id = KVM_IRQCHIP_IOAPIC;
     kioapic = &chip.chip.ioapic;
-
+    QEMU_LOG("gonglei: kvm_ioapic_put\n");
     kioapic->id = s->id;
     kioapic->ioregsel = s->ioregsel;
     kioapic->base_address = s->busdev.mmio[0].addr;
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 042e960..110be20 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
     if (info->post_load) {
         info->post_load(s);
     }
+    QEMU_LOG("gonglei: apic_dispatch_post_load\n");
     return 0;
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index 92c1427..d074c0a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,7 +167,7 @@ static void process_incoming_migration_co(void *opaque)
     const char *arpGuestCMD = "{\"execute\":\"guest-write-flag-arp\"}\n";
     const char *portName = "charchannel1";
     size_t uRet;
-
+    QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
     ret = qemu_loadvm_state(f);
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
diff --git a/qemu-timer.c b/qemu-timer.c
index 5741f0d..dd36bc9 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
     QEMUTimerList *tl;
     bool old = clock->enabled;
     clock->enabled = enabled;
+    QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled, old);
     if (enabled && !old) {
         qemu_clock_notify(type);
     } else if (!enabled && old) {
@@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
     bool progress = false;
     QEMUTimerCB *cb;
     void *opaque;
+    static int count = 10;
 
     qemu_event_reset(&timer_list->timers_done_ev);
     if (!timer_list->clock->enabled) {
@@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
             qemu_mutex_unlock(&timer_list->active_timers_lock);
             break;
         }
-
+	if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
+	    QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
+             ts->expire_time, current_time);
+	}
         /* remove timer from the list before calling the callback */
         timer_list->active_timers = ts->next;
         ts->next = NULL;
diff --git a/vl.c b/vl.c
index 06f34fe..824b246 100644
--- a/vl.c
+++ b/vl.c
@@ -978,7 +978,7 @@ void vm_start(void)
 {
     RunState requested;
     int64_t start_time, end_time;
-
+    QEMU_LOG("gonglei: enter vm_start\n");
     qemu_vmstop_requested(&requested);
     if (runstate_is_running() && requested == RUN_STATE_MAX) {
         return;


Can we wait looply the migration process finished when EAGAIN?

Regards,
-Gonglei


> -----Original Message-----
> From: Gonglei (Arei)
> Sent: Friday, August 05, 2016 5:14 PM
> To: Paolo Bonzini; 'jan.kiszka@xxxxxxxxxxx'; 'amit.shah@xxxxxxxxxx'
> Cc: qemu-devel@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; Huangweidong (C)
> Subject: [Question] virtio-serial misses irq delivery on migration?
> 
> Hi Paolo , Jan, Amit
> 
> Recently we encountered a problem that the virtio-serial can't work after
> Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> 
> It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> because
> the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> interrupt
> can't be injected to VM because the irq line bit had been set to 1, but the
> frontend
> driver never handle it or never know it.
> 
>  Bug 867366 - virtio-serial misses irq delivery on migration
> https://bugzilla.redhat.com/show_bug.cgi?id=867366
> 
> But my qemu is the newest qemu, Both commit 80dcfb8532"virtio-serial-bus:
> post_load
> send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> propagate
> guest_connected to the port on post_load" are applied.
> 
> I noticed that Paolo posted another problem maybe have a pertential problem
> about
> apic in Comment 23. But this patch
>  https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> haven't merged into qemu master.
> 
> Would you give me some clues please? Thanks!
> 
> Regards,
> -Gonglei
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux