Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.

Yoshiaki Tamura <tamura.yoshiaki@xxxxxxxxxxxxx> · Tue, 4 Jan 2011 21:20:53 +0900

2011/1/4 Michael S. Tsirkin <mst@xxxxxxxxxx>:
> On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
>> 2010/11/29 Stefan Hajnoczi <stefanha@xxxxxxxxx>:
>> > On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
>> > <tamura.yoshiaki@xxxxxxxxxxxxx> wrote:
>> >> event-tap controls when to start FT transaction, and provides proxy
>> >> functions to called from net/block devices.  While FT transaction, it
>> >> queues up net/block requests, and flush them when the transaction gets
>> >> completed.
>> >>
>> >> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@xxxxxxxxxxxxx>
>> >> Signed-off-by: OHMURA Kei <ohmura.kei@xxxxxxxxxxxxx>
>> >> ---
>> >>  Makefile.target |    1 +
>> >>  block.h         |    9 +
>> >>  event-tap.c     |  794 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>  event-tap.h     |   34 +++
>> >>  net.h           |    4 +
>> >>  net/queue.c     |    1 +
>> >>  6 files changed, 843 insertions(+), 0 deletions(-)
>> >>  create mode 100644 event-tap.c
>> >>  create mode 100644 event-tap.h
>> >
>> > event_tap_state is checked at the beginning of several functions.  If
>> > there is an unexpected state the function silently returns.  Should
>> > these checks really be assert() so there is an abort and backtrace if
>> > the program ever reaches this state?
>> >
>> >> +typedef struct EventTapBlkReq {
>> >> +    char *device_name;
>> >> +    int num_reqs;
>> >> +    int num_cbs;
>> >> +    bool is_multiwrite;
>> >
>> > Is multiwrite logging necessary?  If event tap is called from within
>> > the block layer then multiwrite is turned into one or more
>> > bdrv_aio_writev() calls.
>> >
>> >> +static void event_tap_replay(void *opaque, int running, int reason)
>> >> +{
>> >> +    EventTapLog *log, *next;
>> >> +
>> >> +    if (!running) {
>> >> +        return;
>> >> +    }
>> >> +
>> >> +    if (event_tap_state != EVENT_TAP_LOAD) {
>> >> +        return;
>> >> +    }
>> >> +
>> >> +    event_tap_state = EVENT_TAP_REPLAY;
>> >> +
>> >> +    QTAILQ_FOREACH(log, &event_list, node) {
>> >> +        EventTapBlkReq *blk_req;
>> >> +
>> >> +        /* event resume */
>> >> +        switch (log->mode & ~EVENT_TAP_TYPE_MASK) {
>> >> +        case EVENT_TAP_NET:
>> >> +            event_tap_net_flush(&log->net_req);
>> >> +            break;
>> >> +        case EVENT_TAP_BLK:
>> >> +            blk_req = &log->blk_req;
>> >> +            if ((log->mode & EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) {
>> >> +                switch (log->ioport.index) {
>> >> +                case 0:
>> >> +                    cpu_outb(log->ioport.address, log->ioport.data);
>> >> +                    break;
>> >> +                case 1:
>> >> +                    cpu_outw(log->ioport.address, log->ioport.data);
>> >> +                    break;
>> >> +                case 2:
>> >> +                    cpu_outl(log->ioport.address, log->ioport.data);
>> >> +                    break;
>> >> +                }
>> >> +            } else {
>> >> +                /* EVENT_TAP_MMIO */
>> >> +                cpu_physical_memory_rw(log->mmio.address,
>> >> +                                       log->mmio.buf,
>> >> +                                       log->mmio.len, 1);
>> >> +            }
>> >> +            break;
>> >
>> > Why are net tx packets replayed at the net level but blk requests are
>> > replayed at the pio/mmio level?
>> >
>> > I expected everything to replay either as pio/mmio or as net/block.
>>
>> Stefan,
>>
>> After doing some heavy load tests, I realized that we have to
>> take a hybrid approach to replay for now.  This is because when a
>> device moves to the next state (e.g. virtio decreases inuse) is
>> different between net and block.  For example, virtio-net
>> decreases inuse upon returning from the net layer,
>> but virtio-blk
>> does that inside of the callback.
>
> For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
> For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
> Both are invoked from a callback.
>
>> If we only use pio/mmio
>> replay, even though event-tap tries to replay net requests, some
>> get lost because the state has proceeded already.
>
> It seems that all you need to do to avoid this is to
> delay the callback?

Yeah, if it's possible.  But if you take a look at virtio-net,
you'll see that virtio_push is called immediately after calling
qemu_sendv_packet while virtio-blk does that in the callback.

>
>> This doesn't
>> happen with block, because the state is still old enough to
>> replay.  Note that using hybrid approach won't cause duplicated
>> requests on the secondary.
>
> An assumption devices make is that a buffer is unused once
> completion callback was invoked. Does this violate that assumption?

No, it shouldn't.  In case of net with net layer replay, we copy
the content of the requests, and in case of block, because we
haven't called the callback yet, the requests remains fresh.

Yoshi

>
> --
> MST
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html