> Hi, > > i have a two-node cluster with several domains as resources. During testing i > tried several times to migrate some domains concurrently. > Usually it suceeded, but rarely it failed. I found one clue in the log: > > Dec 03 16:03:02 ha-idg-1 libvirtd[3252]: 2018-12-03 15:03:02.758+0000: 3252: > error : virKeepAliveTimerInternal:143 : internal error: connection closed due > to keepalive timeout > > The domains are configured similar: > primitive vm_geneious VirtualDomain \ > params config="/mnt/san/share/config.xml" \ > params hypervisor="qemu:///system" \ > params migration_transport=ssh \ > op start interval=0 timeout=120 trace_ra=1 \ > op stop interval=0 timeout=130 trace_ra=1 \ > op monitor interval=30 timeout=25 trace_ra=1 \ > op migrate_from interval=0 timeout=300 trace_ra=1 \ > op migrate_to interval=0 timeout=300 trace_ra=1 \ > meta allow-migrate=true target-role=Started is-managed=true \ > utilization cpu=2 hv_memory=8000 > > What is the algorithm to discover the port used for live migration ? > I have the impression that "params migration_transport=ssh" is worthless, port > 22 isn't involved for live migration. > My experience is that for the migration tcp ports > 49151 are used. But the > exact procedure isn't clear for me. > Does live migration uses first tcp port 49152 and for each following domain one > port higher ? > E.g. for the concurrent live migration of three domains 49152, 49153 and 49154. > > Why does live migration for three domains usually succeed, although on both > hosts just 49152 and 49153 is open ? > Is the migration not really concurrent, but sometimes sequential ? > > Bernd > Hi, i tried to narrow down the problem. My first assumption was that something with the network between the hosts is not ok. I opened port 49152 - 49172 in the firewall - problem persisted. So i deactivated the firewall on both nodes - problem persisted. Then i wanted to exclude the HA-Cluster software (pacemaker). I unmanaged the VirtualDomains in pacemaker and migrated them with virsh - problem persists. I wrote a script to migrate three domains sequentially from host A to host B and vice versa via virsh. I raised up the loglevel from libvirtd and found s.th. in the log which may be the culprit: This is the output of my script: Thu Dec 6 17:02:53 CET 2018 migrate sim Migration: [100 %] Thu Dec 6 17:03:07 CET 2018 migrate geneious Migration: [100 %] Thu Dec 6 17:03:16 CET 2018 migrate mausdb Migration: [ 99 %]error: operation failed: migration job: unexpectedly failed <===== error ! Thu Dec 6 17:05:32 CET 2018 <======== time of error Guests on ha-idg-1: \n Id Name State ---------------------------------------------------- 1 sim running 2 geneious running - mausdb shut off migrate to ha-idg-2\n Thu Dec 6 17:05:32 CET 2018 This is what journalctl told: Dec 06 17:05:32 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:32.481+0000: 12553: info : virKeepAliveTimerInternal:136 : RPC_KEEPALIVE_TIMEOUT: ka=0x55b2bb937740 client=0x55b2bb930d50 countToDeath=0 idle=30 Dec 06 17:05:32 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:32.481+0000: 12553: error : virKeepAliveTimerInternal:143 : internal error: connection closed due to keepalive timeout Dec 06 17:05:32 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:32.481+0000: 12553: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x55b2bb937740 Dec 06 17:05:27 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:27.476+0000: 12553: info : virKeepAliveTimerInternal:136 : RPC_KEEPALIVE_TIMEOUT: ka=0x55b2bb937740 client=0x55b2bb930d50 countToDeath=1 idle=25 Dec 06 17:05:27 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:27.476+0000: 12553: info : virKeepAliveMessage:107 : RPC_KEEPALIVE_SEND: ka=0x55b2bb937740 client=0x55b2bb930d50 prog=1801807216 vers=1 proc=1 Dec 06 17:05:22 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:22.471+0000: 12553: info : virKeepAliveTimerInternal:136 : RPC_KEEPALIVE_TIMEOUT: ka=0x55b2bb937740 client=0x55b2bb930d50 countToDeath=2 idle=20 Dec 06 17:05:22 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:22.471+0000: 12553: info : virKeepAliveMessage:107 : RPC_KEEPALIVE_SEND: ka=0x55b2bb937740 client=0x55b2bb930d50 prog=1801807216 vers=1 proc=1 Dec 06 17:05:17 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:17.466+0000: 12553: info : virKeepAliveTimerInternal:136 : RPC_KEEPALIVE_TIMEOUT: ka=0x55b2bb937740 client=0x55b2bb930d50 countToDeath=3 idle=15 Dec 06 17:05:17 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:17.466+0000: 12553: info : virKeepAliveMessage:107 : RPC_KEEPALIVE_SEND: ka=0x55b2bb937740 client=0x55b2bb930d50 prog=1801807216 vers=1 proc=1 Dec 06 17:05:12 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:12.460+0000: 12553: info : virKeepAliveTimerInternal:136 : RPC_KEEPALIVE_TIMEOUT: ka=0x55b2bb937740 client=0x55b2bb930d50 countToDeath=4 idle=10 Dec 06 17:05:12 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:12.460+0000: 12553: info : virKeepAliveMessage:107 : RPC_KEEPALIVE_SEND: ka=0x55b2bb937740 client=0x55b2bb930d50 prog=1801807216 vers=1 proc=1 Dec 06 17:05:07 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:07.455+0000: 12553: info : virKeepAliveTimerInternal:136 : RPC_KEEPALIVE_TIMEOUT: ka=0x55b2bb937740 client=0x55b2bb930d50 countToDeath=5 idle=5 Dec 06 17:05:07 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:07.455+0000: 12553: info : virKeepAliveMessage:107 : RPC_KEEPALIVE_SEND: ka=0x55b2bb937740 client=0x55b2bb930d50 prog=1801807216 vers=1 proc=1 There seems to be a kind of a countdown. From googleing i found that this may be related to libvirtd.conf: # Keepalive settings for the admin interface #admin_keepalive_interval = 5 #admin_keepalive_count = 5 What is meant by the "admin interface" ? virsh ? What is meant by "client" in libvirtd.conf ? virsh ? Why do i have regular timeouts although my two hosts are very performant ? 128GB RAM, 16 cores, 2 1GBit/s network adapter on each host in bonding. During migration i don't see much load, although nearly no waiting for IO. Should i set admin_keepalive_interval to -1 ? Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDirig.in Petra Steiner-Hoffmann Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Dr. rer. nat. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 _______________________________________________ libvirt-users mailing list libvirt-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvirt-users