On Wed, Nov 25, 2020 at 02:33:44PM +0100, Christian Ehrhardt wrote: > On Wed, Nov 25, 2020 at 1:38 PM Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote: > > > > On Wed, Nov 25, 2020 at 01:28:09PM +0100, Christian Ehrhardt wrote: > > > On Wed, Nov 25, 2020 at 10:55 AM Christian Ehrhardt > > > <christian.ehrhardt@xxxxxxxxxxxxx> wrote: > > > > > > > > On Tue, Nov 24, 2020 at 4:30 PM Peter Krempa <pkrempa@xxxxxxxxxx> wrote: > > > > > > > > > > On Tue, Nov 24, 2020 at 16:05:53 +0100, Christian Ehrhardt wrote: > > > > > > Hi, > > > > > > > > > > [...] > > > > > > > > BTW to reduce the scope what to think about - I have rebuilt 6.8 as > > > > well it works. > > > > Thereby I can confirm that the offending change should be in between > > > > 6.8.0 -> 6.9.0. > > > > > > I was able to get this working in git bisect builds from git between > > > v6.8 / v6.9. > > > I identified the following offending commit: > > > 7d959c30 rpc: Fix virt-ssh-helper detection > > > > > > Ok that makes a bit of sense, first we had in 6.8 > > > f8ec7c84 rpc: use new virt-ssh-helper binary for remote tunnelling > > > That makes it related to tunneling which matches our broken use-case. > > > > > > The identified commit "7d959c30 rpc: Fix virt-ssh-helper detection" might > > > finally really enable the new helper and that is then broken? > > > > > > With that knowledge I was able to confirm that it really is the native mode > > > > > > $ virsh migrate --unsafe --live --p2p --tunnelled h-migr-test > > > qemu+ssh://testkvm-hirsute-to/system?proxy=netcat > > > <works> > > > $ virsh migrate --unsafe --live --p2p --tunnelled h-migr-test > > > qemu+ssh://testkvm-hirsute-to/system?proxy=native > > > <hangs> > > > > > > I recently discussed with Andrea if we'd need apparmor rules for > > > virt-ssh-helper, > > > but there are no denials nor libvirt log entries related to virt-ssh-helper. > > > But we don't need such rules since it is spawned on the ssh login and > > > not under libvirtd itself. > > > > > > PS output of the hanging receiving virt-ssh-helper (looks not too unhappy): > > > Source: > > > 4 0 41305 1 20 0 1627796 23360 poll_s Ssl ? > > > 0:05 /usr/sbin/libvirtd > > > 0 0 41523 41305 20 0 9272 4984 poll_s S ? > > > 0:02 \_ ssh -T -e none -- testkvm-hirsute-to sh -c 'virt-ssh-helper > > > 'qemu:///system'' > > > Target > > > 4 0 213 1 20 0 13276 4132 poll_s Ss ? > > > 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 250-500 startups > > > 4 0 35148 213 20 0 19048 11320 poll_s Ss ? > > > 0:02 \_ sshd: root@notty > > > 4 0 35206 35148 20 0 2584 544 do_wai Ss ? > > > 0:00 \_ sh -c virt-ssh-helper qemu:///system > > > 0 0 35207 35206 20 0 81348 26684 - R ? > > > 0:34 \_ virt-ssh-helper qemu:///system > > > > > > I've looked at it with strace [1] and gdb for backtraces [2] - it is > > > not dead or stuck and keeps working. > > > Could it be just so slow that it appears to hang until it times out? > > > Or is the event mechanism having issues and it wakes up too rarely? > > > > Lets take migration out of the picture. What if you simply do > > > > virsh -c qemu+ssh://testkvm-hirsute-to/system?proxy=native list > > > > does that work ? > > Yes it does, no hang and proper results Ok, so that shows virt-ssh-helper is not completely broken at least. Makes me think there is possibly something related to streams code that causes the issue. You might try the virsh "console" or "vol-upload" commands to test the streams stuff in isolation. If that also works, then the problem is specific to migration, and we'll probably wnt to colllect debug level logs from src+dst hosts. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|