Re: [Qemu-devel] [PATCH] kvm: First step to push iothread lock out of inner run loop

Stefan Hajnoczi <stefanha@xxxxxxxxx> · Thu, 28 Jun 2012 09:45:55 +0100

On Wed, Jun 27, 2012 at 12:19 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
> On Wed, Jun 27, 2012 at 08:41:49AM +0100, Stefan Hajnoczi wrote:
>> On Wed, Jun 27, 2012 at 8:39 AM, Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote:
>> > On Tue, Jun 26, 2012 at 8:34 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
>> >> On Sat, Jun 23, 2012 at 12:55:49AM +0200, Jan Kiszka wrote:
>> >> net.txt
>> >> --------
>> >>
>> >>
>> >> iothread flow
>> >> =============
>> >>
>> >> 1) Skip-work-if-device-locked
>> >>
>> >> select(tap fd ready)
>> >> tap_send
>> >>    if (trylock(TAPState->NetClientState->dev))
>> >>        proceed;
>> >>    else
>> >>        continue; (data remains in queue)
>> >>    tap_read_packet
>> >>    qemu_send_packet_async
>> >>
>> >> DRAWBACK: lock intensive.
>> >> DRAWBACK: VCPU threads can starve IOTHREAD (can be fixed with
>> >> a scheme such as trylock() marks a flag saying "iothread wants lock".
>> >
>> > One alternative is to say the lock cannot be held across system calls
>> > or other long-running operations (similar to interrupt handler
>> > spinlocks in the kernel).  But QEMU threads are still at the mercy of
>> > the scheduler or page faults, so perhaps this is not a good idea
>> > because waiting on the lock could tie up the iothread.
>> >
>> >> 2) Queue-work-to-vcpu-context
>> >>
>> >> select(tap fd ready)
>> >> tap_send
>> >>    if (trylock(TAPState->NetClientState->dev)) {
>> >>        proceed;
>> >>    } else {
>> >>        AddToDeviceWorkQueue(work);
>> >>    }
>> >>    tap_read_packet
>> >>    qemu_send_packet_async
>> >>
>> >> DRAWBACK: lock intensive
>> >> DRAWBACK: cache effects
>> >
>> > This almost suggests we should invert packet reception for NICs.  tap
>> > fd ready should add a work item and the guest device calls into
>> > net/tap.c to pull out any packets from the work function:
>> >
>> > tap_send
>> >    dev_add_work(work);
>> >
>> > virtio_net_rx_work_fn
>> >    while ((req = virtqueue_pop(rx_queue))) {
>> >        if (!net_receive_packet(netclient, req->buf)) {
>> >            break; /* no more packets available */
>> >        }
>> >        /* ...mark req complete and push onto virtqueue... */
>> >    }
>> >    virtqueue_notify(rx_queue);
>> >
>> > The idea is to avoid copies on the rx path and make the locking simple
>> > by always deferring to a work function (which can be invoked
>> > immediately if the device isn't locked).
>>
>> I just realized this approach bypasses net/queue.c.  I think it works
>> really well for the peer model (no "VLANs").  Basically flow control
>> is dictated by the peer (receiver) and we don't need to use NetQueue
>> anymore.  Whether this works elegantly for slirp and other backends,
>> I'm not sure yet.
>>
>> Stefan
>
> Right. The advantage is only backends where performance matters need to
> be converted (and only net hw drivers that matter performance wise need
> to be converted).
>
> Apparently you guys have very different ideas on the higher level
> model here. It would be good to agree on one structure (including
> modifications on the one above) to follow and share the work on that
> direction.

"You guys" == Ping Fan + Jan + Anthony + Avi?  I'm not working
directly on QBL removal myself.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html