Re: Network hangs when communicating with host

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 19, 2015 at 11:22 AM, Andre Przywara <andre.przywara@xxxxxxx> wrote:
> Hi Dmitry,
>
> On 19/10/15 10:05, Dmitry Vyukov wrote:
>> On Fri, Oct 16, 2015 at 7:25 PM, Sasha Levin <sasha.levin@xxxxxxxxxx> wrote:
>>> On 10/15/2015 04:20 PM, Dmitry Vyukov wrote:
>>>> Hello,
>>>>
>>>> I am trying to run a program in lkvm sandbox so that it communicates
>>>> with a program on host. I run lkvm as:
>>>>
>>>> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel
>>>> /arch/x86/boot/bzImage --network mode=user -- /my_prog
>>>>
>>>> /my_prog then connects to a program on host over a tcp socket.
>>>> I see that host receives some data, sends some data back, but then
>>>> my_prog hangs on network read.
>>>>
>>>> To localize this I wrote 2 programs (attached). ping is run on host
>>>> and pong is run from lkvm sandbox. They successfully establish tcp
>>>> connection, but after some iterations both hang on read.
>>>>
>>>> Networking code in Go runtime is there for more than 3 years, widely
>>>> used in production and does not have any known bugs. However, it uses
>>>> epoll edge-triggered readiness notifications that known to be tricky.
>>>> Is it possible that lkvm contains some networking bug? Can it be
>>>> related to the data races in lkvm I reported earlier today?
>
> Just to let you know:
> I think we have seen networking issues in the past - root over NFS had
> issues IIRC. Will spent some time on debugging this and it looked like a
> race condition in kvmtool's virtio implementation. I think pinning
> kvmtool's virtio threads to one host core made this go away. However
> although he tried hard (even by Will's standards!) he couldn't find a
> the real root cause or a fix at the time he looked at it and we found
> other ways to work around the issues (using virtio-blk or initrd's).
>
> So it's quite possible that there are issues. I haven't had time yet to
> look at your sanitizer reports, but it looks like a promising approach
> to find the root cause.


Thanks, Andre!

ping/pong does not hang within at least 5 minutes when I run lkvm
under taskset 1.

And, yeah, this pretty strongly suggests a data race. ThreadSanitizer
can point you to the bug within a minute, so you just need to say
"aha! it is here". Or maybe not. There are no guarantees. But if you
already spent significant time on this, then checking the reports
definitely looks like a good idea.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux