On Tue, 2010-11-09 at 10:44 -0700, Alex Williamson wrote: > On Tue, 2010-11-09 at 18:49 +0200, Michael S. Tsirkin wrote: > > On Tue, Nov 09, 2010 at 09:30:45AM -0700, Alex Williamson wrote: > > > On Tue, 2010-11-09 at 18:15 +0200, Michael S. Tsirkin wrote: > > > > On Tue, Nov 09, 2010 at 08:47:00AM -0700, Alex Williamson wrote: > > > > > > > But it could. What if ivshmem is acting in a peer role, but has no > > > > > > > clients, could it migrate? What if ivshmem is migratable when the > > > > > > > migration begins, but while the migration continues, a connection is > > > > > > > setup and it becomes unmigratable. > > > > > > > > > > > > Sounds like something we should work to prevent, not support :) > > > > > > > > > > s/:)/:(/ why? > > > > > > > > It will just confuse everyone. Also if it happens after sending > > > > all of memory, it's pretty painful. > > > > > > It happens after sending all of memory with no_migrate, and I think > > > pushing that earlier might introduce some races around when > > > register_device_unmigratable() can be called. > > > > Good point. I guess we could check it twice just to speed things up. > > > > > > > > > Using this series, ivshmem would > > > > > > > have multiple options how to support this. It could a) NAK the > > > > > > > migration, b) drop connections and prevent new connections until the > > > > > > > migration finishes, c) detect that new connections have happened since > > > > > > > the migration started and cancel. And probably more. no_migrate can > > > > > > > only do a). And in fact, we can only test no_migrate after the VM is > > > > > > > stopped (after all memory is migrated) because otherwise it could race > > > > > > > with devices setting no_migrate during migration. > > > > > > > > > > > > We really want no_migrate to be static. changing it is abusing > > > > > > the infrastructure. > > > > > > > > > > You call it abusing, I call it making use of the infrastructure. Why > > > > > unnecessarily restrict ourselves? Is return 0/-1 really that scary, > > > > > unmaintainable, undebuggable? I don't understand the resistance. > > > > > > > > > > Alex > > > > > > > > management really does not know how to handle unexpected > > > > migration failures. They must be avoided. > > > > > > > > There are some very special cases that fail migration. They are > > > > currently easy to find with grep register_device_unmigratable. > > > > I prefer to keep it that way. > > > > > > How can management tools be improved to better handle unexpected > > > migration failures when the only way for qemu to fail is an abort? > > > We need the infrastructure to at least return an error first. Do we just > > > need to add some fprintfs to the save core to print the id string of the > > > device that failed to save? I just can't buy the "code is easier to > > > grep" as an argument against adding better error handling to the save > > > code path. > > > > I just don't buy the 'we'll return meaningless error codes at random > > point in time and management will figure it out' as an argument :) > > Why is the error code meaningless? The error code stops the migration > in qemu and hopefully prints an error message (we could easily add an > fprintf to the core save code to ensure we know the device responsible > for the NAK). From there we can figure out how to return the error to > monitors and management tools, but we need to have a way to know there's > an error first. > > > > Anyone else want to chime in? > > > > > > Alex > > > > Maybe try coding up some user using the new infrastructure to do > > something useful, that register_device_unmigratable can't do. > > With the number of people I hear complaining about how qemu has too many > aborts, no error checking, and no way to return errors, I'm a little > dumbfounded that there's such a roadblock to actually add some simple > error handling. Is it the error handling you're opposed to, or the way > I'm using it to NAK a migration? Would something like this in place of 6/6 ease your grep'ability and debug'ability concerns? Unfortunately it adds an assert, but if we stomp on the vmsd, then the save state entry can't be unregistered. Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> --- diff --git a/savevm.c b/savevm.c index 521edc8..b0aaa46 100644 --- a/savevm.c +++ b/savevm.c @@ -1039,7 +1039,6 @@ typedef struct SaveStateEntry { const VMStateDescription *vmsd; void *opaque; CompatEntry *compat; - int no_migrate; } SaveStateEntry; @@ -1103,7 +1102,6 @@ int register_savevm_live(DeviceState *dev, se->load_state = load_state; se->opaque = opaque; se->vmsd = NULL; - se->no_migrate = 0; if (dev && dev->parent_bus && dev->parent_bus->info->get_dev_path) { char *id = dev->parent_bus->info->get_dev_path(dev); @@ -1170,6 +1168,22 @@ void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque) } } +static int nomigrate_set_params(int blk_enable, int shared, void *opaque) +{ + return -EFAULT; +} + +static int nomigrate_save_live_state(Monitor *mon, QEMUFile *f, int stage, + void *opaque) +{ + return -EFAULT; +} + +static int nomigrate_save_state(QEMUFile *f, void *opaque) +{ + return -EFAULT; +} + /* mark a device as not to be migrated, that is the device should be unplugged before migration */ void register_device_unmigratable(DeviceState *dev, const char *idstr, @@ -1190,7 +1204,10 @@ void register_device_unmigratable(DeviceState *dev, const char *idstr, QTAILQ_FOREACH(se, &savevm_handlers, entry) { if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) { - se->no_migrate = 1; + se->set_params = nomigrate_set_params; + se->save_live_state = nomigrate_save_live_state; + se->save_state = nomigrate_save_state; + assert(!se->vmsd); } } } @@ -1410,10 +1427,6 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry *se, int version_id) static int vmstate_save(QEMUFile *f, SaveStateEntry *se) { - if (se->no_migrate) { - return -1; - } - if (!se->vmsd) { /* Old style */ return se->save_state(f, se->opaque); } @@ -1443,6 +1456,9 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable, } ret = se->set_params(blk_enable, shared, se->opaque); if (ret < 0) { + monitor_printf(mon, + "Save state begin blocked by device '%s', error:" + " %s\n", se->idstr, strerror(-ret)); return ret; } } @@ -1470,6 +1486,9 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable, ret = se->save_live_state(mon, f, QEMU_VM_SECTION_START, se->opaque); if (ret < 0) { + monitor_printf(mon, + "Save state begin blocked by device '%s', error:" + " %s\n", se->idstr, strerror(-ret)); return ret; } } @@ -1503,6 +1522,9 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f) synchronized over and over again. */ break; } else if (ret < 0) { + monitor_printf(mon, + "Save state iterate blocked by device '%s', error:" + " %s\n", se->idstr, strerror(-ret)); return ret; } } @@ -1535,6 +1557,9 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f) r = se->save_live_state(mon, f, QEMU_VM_SECTION_END, se->opaque); if (r < 0) { + monitor_printf(mon, + "Save state complete blocked by device '%s', error:" + " %s\n", se->idstr, strerror(-r)); return r; } } @@ -1559,7 +1584,9 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f) r = vmstate_save(f, se); if (r < 0) { - monitor_printf(mon, "cannot migrate with device '%s'\n", se->idstr); + monitor_printf(mon, + "Save state complete blocked by device '%s', error:" + " %s\n", se->idstr, strerror(-r)); return r; } } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html