Re: Network hangs when communicating with host

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dmitry,

On 19/10/15 10:05, Dmitry Vyukov wrote:
> On Fri, Oct 16, 2015 at 7:25 PM, Sasha Levin <sasha.levin@xxxxxxxxxx> wrote:
>> On 10/15/2015 04:20 PM, Dmitry Vyukov wrote:
>>> Hello,
>>>
>>> I am trying to run a program in lkvm sandbox so that it communicates
>>> with a program on host. I run lkvm as:
>>>
>>> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel
>>> /arch/x86/boot/bzImage --network mode=user -- /my_prog
>>>
>>> /my_prog then connects to a program on host over a tcp socket.
>>> I see that host receives some data, sends some data back, but then
>>> my_prog hangs on network read.
>>>
>>> To localize this I wrote 2 programs (attached). ping is run on host
>>> and pong is run from lkvm sandbox. They successfully establish tcp
>>> connection, but after some iterations both hang on read.
>>>
>>> Networking code in Go runtime is there for more than 3 years, widely
>>> used in production and does not have any known bugs. However, it uses
>>> epoll edge-triggered readiness notifications that known to be tricky.
>>> Is it possible that lkvm contains some networking bug? Can it be
>>> related to the data races in lkvm I reported earlier today?

Just to let you know:
I think we have seen networking issues in the past - root over NFS had
issues IIRC. Will spent some time on debugging this and it looked like a
race condition in kvmtool's virtio implementation. I think pinning
kvmtool's virtio threads to one host core made this go away. However
although he tried hard (even by Will's standards!) he couldn't find a
the real root cause or a fix at the time he looked at it and we found
other ways to work around the issues (using virtio-blk or initrd's).

So it's quite possible that there are issues. I haven't had time yet to
look at your sanitizer reports, but it looks like a promising approach
to find the root cause.

Cheers,
Andre.

>>>
>>> I am on commit 3695adeb227813d96d9c41850703fb53a23845eb.
>>
>> Hey Dmitry,
>>
>> How long does it take to reproduce? I've been running ping/pong as you've
>> described and it looks like it doesn't get stuck (read/writes keep going
>> on both sides).
>>
>> Can you share your guest kernel config maybe?
> 
> 
> Humm.... it my setup it happens within a minute or so.
> 
> My kernel is not completely standard, but it works with qemu without
> any problems.
> It is not trivial to reproduce, but FWIW I on commit
> f9fbf6b72ffaaca8612979116c872c9d5d9cc1f5 of
> https://github.com/dvyukov/linux/commits/coverage branch. Config file
> is attached. Then, I build it with custom gcc: revision 228818 +
> https://codereview.appspot.com/267910043 patch. This is all per
> https://github.com/google/syzkaller instructions.
> 
> I run lkvm as:
> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel
> /arch/x86/boot/bzImage --network mode=user -- /pong
> 
> kvmtool is on 3695adeb227813d96d9c41850703fb53a23845eb.
> 
> Just tried to do the same with qemu, it does not hang.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux