The issue is reproducible in svl-3 with rbd cache set to false. On the 5th ping-pong, the instance experienced ping drops and did not recover for 20+ minutes: (os-clients)[root@fedora21 nimbus-env]# nova live-migration lmtest1 (os-clients)[root@fedora21 nimbus-env]# nova show lmtest1 |grep -E 'hypervisor_hostname|task_state|vm_state' | OS-EXT-SRV-ATTR:hypervisor_hostname | svl-3-cc-nova1-002.cisco.com | | OS-EXT-STS:task_state | migrating | | OS-EXT-STS:vm_state | active | (os-clients)[root@fedora21 nimbus-env]# nova show lmtest1 |grep -E 'hypervisor_hostname|task_state|vm_state' | OS-EXT-SRV-ATTR:hypervisor_hostname | svl-3-cc-nova1-001.cisco.com | | OS-EXT-STS:task_state | - | | OS-EXT-STS:vm_state | active | (os-clients)[root@fedora21 nimbus-env]# ping -c3 -S60 10.33.143.215 PING 10.33.143.215 (10.33.143.215) 56(84) bytes of data. --- 10.33.143.215 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2001ms (os-clients)[root@fedora21 nimbus-env]# ping -c3 -S60 10.33.143.215 PING 10.33.143.215 (10.33.143.215) 56(84) bytes of data. --- 10.33.143.215 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms (os-clients)[root@fedora21 nimbus-env]# ping -c3 -S60 10.33.143.215 PING 10.33.143.215 (10.33.143.215) 56(84) bytes of data. --- 10.33.143.215 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms ‹ Yuming On 4/10/15, 4:51 PM, "Josh Durgin" <jdurgin@xxxxxxxxxx> wrote: >On 04/08/2015 09:37 PM, Yuming Ma (yumima) wrote: >> Josh, >> >> I think we are using plain live migration and not mirroring block drives >> as the other test did. > >Do you have the migration flags or more from the libvirt log? Also >which versions of qemu is this? > >The libvirt log message about qemuMigrationCancelDriveMirror from your >first email is suspicious. Being unable to stop it may mean it was not >running (fine, but libvirt shouldn't have tried to stop it), or it kept >running (bad esp. if it's trying to copy to the same rbd). > >> What are the chances or scenario that disk image >> can be corrupted during the live migration for both source and target >> are connected to the same volume and RBD caches is turned on: > >Generally rbd caching with live migration is safe. The way to get >corruption is to have drive-mirror try to copy over the rbd on the >destination while the source is still using the disk... > >Did you observe fs corruption after a live migration, or just other odd >symptoms? Since a reboot fixed it, it sounds more like memory corruption >to me, unless it was fsck'd during reboot. > >Josh _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com