Re: [Qemu-devel] [PATCH] kvm: First step to push iothread lock out of inner run loop

Marcelo Tosatti <mtosatti@xxxxxxxxxx> · Wed, 27 Jun 2012 08:19:03 -0300

On Wed, Jun 27, 2012 at 08:41:49AM +0100, Stefan Hajnoczi wrote:
> On Wed, Jun 27, 2012 at 8:39 AM, Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote:
> > On Tue, Jun 26, 2012 at 8:34 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
> >> On Sat, Jun 23, 2012 at 12:55:49AM +0200, Jan Kiszka wrote:
> >> net.txt
> >> --------
> >>
> >>
> >> iothread flow
> >> =============
> >>
> >> 1) Skip-work-if-device-locked
> >>
> >> select(tap fd ready)
> >> tap_send
> >>    if (trylock(TAPState->NetClientState->dev))
> >>        proceed;
> >>    else
> >>        continue; (data remains in queue)
> >>    tap_read_packet
> >>    qemu_send_packet_async
> >>
> >> DRAWBACK: lock intensive.
> >> DRAWBACK: VCPU threads can starve IOTHREAD (can be fixed with
> >> a scheme such as trylock() marks a flag saying "iothread wants lock".
> >
> > One alternative is to say the lock cannot be held across system calls
> > or other long-running operations (similar to interrupt handler
> > spinlocks in the kernel).  But QEMU threads are still at the mercy of
> > the scheduler or page faults, so perhaps this is not a good idea
> > because waiting on the lock could tie up the iothread.
> >
> >> 2) Queue-work-to-vcpu-context
> >>
> >> select(tap fd ready)
> >> tap_send
> >>    if (trylock(TAPState->NetClientState->dev)) {
> >>        proceed;
> >>    } else {
> >>        AddToDeviceWorkQueue(work);
> >>    }
> >>    tap_read_packet
> >>    qemu_send_packet_async
> >>
> >> DRAWBACK: lock intensive
> >> DRAWBACK: cache effects
> >
> > This almost suggests we should invert packet reception for NICs.  tap
> > fd ready should add a work item and the guest device calls into
> > net/tap.c to pull out any packets from the work function:
> >
> > tap_send
> >    dev_add_work(work);
> >
> > virtio_net_rx_work_fn
> >    while ((req = virtqueue_pop(rx_queue))) {
> >        if (!net_receive_packet(netclient, req->buf)) {
> >            break; /* no more packets available */
> >        }
> >        /* ...mark req complete and push onto virtqueue... */
> >    }
> >    virtqueue_notify(rx_queue);
> >
> > The idea is to avoid copies on the rx path and make the locking simple
> > by always deferring to a work function (which can be invoked
> > immediately if the device isn't locked).
> 
> I just realized this approach bypasses net/queue.c.  I think it works
> really well for the peer model (no "VLANs").  Basically flow control
> is dictated by the peer (receiver) and we don't need to use NetQueue
> anymore.  Whether this works elegantly for slirp and other backends,
> I'm not sure yet.
> 
> Stefan

Right. The advantage is only backends where performance matters need to
be converted (and only net hw drivers that matter performance wise need
to be converted).

Apparently you guys have very different ideas on the higher level
model here. It would be good to agree on one structure (including
modifications on the one above) to follow and share the work on that
direction.

Having competing implementations in this case is a waste of time.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html