Re: [PATCH] virtio: console: Don't block entire guest if host doesn't read data

Hans de Goede <hdegoede@xxxxxxxxxx> · Tue, 19 Oct 2010 09:23:00 +0200

Hi,

On 10/19/2010 09:10 AM, Amit Shah wrote:
> On (Tue) Oct 19 2010 [08:55:16], Hans de Goede wrote:
>> Hi,
>>
>> On 10/19/2010 07:45 AM, Amit Shah wrote:
>>> If the host is slow in reading data or doesn't read data at all,
>>> blocking write calls not only blocked the program that called write()
>>> but the entire guest itself.
>>>
>>> To overcome this, let's not block till the host signals it has given
>>> back the virtio ring element we passed it.  Instead, send the buffer to
>>> the host and return to userspace.  This operation then becomes similar
>>> to how non-blocking writes work, so let's use the existing code for this
>>> path as well.
>>>
>>> This code change also ensures blocking write calls do get blocked if
>>> there's not enough room in the virtio ring as well as they don't return
>>> -EAGAIN to userspace.
>>>
>>> Signed-off-by: Amit Shah<amit.shah@xxxxxxxxxx>
>>> CC: stable@xxxxxxxxxx
>>> ---
>>>   drivers/char/virtio_console.c |   17 ++++++++++++++---
>>>   1 files changed, 14 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
>>> index c810481..0f69c5e 100644
>>> --- a/drivers/char/virtio_console.c
>>> +++ b/drivers/char/virtio_console.c
>>> @@ -459,9 +459,12 @@ static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
>>>
>>>   	/*
>>>   	 * Wait till the host acknowledges it pushed out the data we
>>> -	 * sent.  This is done for ports in blocking mode or for data
>>> -	 * from the hvc_console; the tty operations are performed with
>>> -	 * spinlocks held so we can't sleep here.
>>> +	 * sent.  This is done for data from the hvc_console; the tty
>>> +	 * operations are performed with spinlocks held so we can't
>>> +	 * sleep here.  An alternative would be to copy the data to a
>>> +	 * buffer and relax the spinning requirement.  The downside is
>>> +	 * we need to kmalloc a GFP_ATOMIC buffer each time the
>>> +	 * console driver writes something out.
>>>   	 */
>>>   	while (!virtqueue_get_buf(out_vq,&len))
>>>   		cpu_relax();
>>> @@ -626,6 +629,14 @@ static ssize_t port_fops_write(struct file *filp, const char __user *ubuf,
>>>   		goto free_buf;
>>>   	}
>>>
>>> +	/*
>>> +	 * We now ask send_buf() to not spin for generic ports -- we
>>> +	 * can re-use the same code path that non-blocking file
>>> +	 * descriptors take for blocking file descriptors since the
>>> +	 * wait is already done and we're certain the write will go
>>> +	 * through to the host.
>>> +	 */
>>> +	nonblock = true;
>>>   	ret = send_buf(port, buf, count, nonblock);
>>>
>>>   	if (nonblock&&   ret>   0)
>>
>> 1) Hmm, this changes the code to kfree the buffer, but only if the send_buf
>>     succeeded (which it always should given we did a will_block check first).
>
> The change is to *not* free the buffer.  It will be freed later when the
> host indicates it's done with it (happens in reclaim_consumed_buffers()).
>

Ah, thanks for explaining that.

>>     I cannot help but notice that the data was not freed on a blocking fd
>>     before this patch, but is freed now. And I see nothing in send_buf to make
>>     it take ownership of the buffer / free it in the blocking case, and not take
>>     ownership in the blocking case. More over if anything I would expect send_buf
>>     to take ownership in the non blocking case (as the data is not directly
>>     consumed there), and not take owner ship in the blocking case, but the check
>>     is the reverse. Also why is the buffer not freed if the write failed, that
>>     makes no sense.
>
> The buffer used to be freed in the blocking case, as we knew for certain
> the host was done with the buffer.  Now it's not, we'll free it later.
>
>> 2) Assuming that things are changed so that send_buf does take ownership of the
>>     buffer in the nonblocking case, shouldn't the buffer then be allocated
>>     with GPF_ATOMIC ?
>
> Why?  We're not called from irq context.
>

Ok, my bad.

>> 3) This patch will cause processes filling the virtqueue fast enough to block
>>     to never wake up again, due to a missing waitqueue wakeup, see:
>>     https://bugzilla.redhat.com/show_bug.cgi?id=643750
>
> Doesn't happen in my testcase, but this patch shouldn't cause that
> problem if it exists -- it's a problem that exists even now for
> nonblocking ports.  So if such a bug exists, it needs to be fixed
> independently.

First of all lets agree that this is a real problem, there is simply nothing
waking the waitqueue were fops_write (or poll) block on when buffers become
available in out_vq, it may be hard to come up with a test case which fills
the queue fast enough to hit this scenario, but it is very real.

I agree it is an independent problem, and should be fixed in a separate
patch, but that patch should be part of the same set and become *before*
this one, as this patch now extends the problem to ports opened in blocking
mode too.

BTW, many thanks for working on this, it is appreciated :)

Regards,

Hans
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/virtualization