Re: [PATCH 09/21] Introduce event-tap.

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Tue, 4 Jan 2011 13:19:08 +0200

On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
> 2010/11/29 Stefan Hajnoczi <stefanha@xxxxxxxxx>:
> > On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
> > <tamura.yoshiaki@xxxxxxxxxxxxx> wrote:
> >> event-tap controls when to start FT transaction, and provides proxy
> >> functions to called from net/block devices.  While FT transaction, it
> >> queues up net/block requests, and flush them when the transaction gets
> >> completed.
> >>
> >> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@xxxxxxxxxxxxx>
> >> Signed-off-by: OHMURA Kei <ohmura.kei@xxxxxxxxxxxxx>
> >> ---
> >>  Makefile.target |    1 +
> >>  block.h         |    9 +
> >>  event-tap.c     |  794 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  event-tap.h     |   34 +++
> >>  net.h           |    4 +
> >>  net/queue.c     |    1 +
> >>  6 files changed, 843 insertions(+), 0 deletions(-)
> >>  create mode 100644 event-tap.c
> >>  create mode 100644 event-tap.h
> >
> > event_tap_state is checked at the beginning of several functions.  If
> > there is an unexpected state the function silently returns.  Should
> > these checks really be assert() so there is an abort and backtrace if
> > the program ever reaches this state?
> >
> >> +typedef struct EventTapBlkReq {
> >> +    char *device_name;
> >> +    int num_reqs;
> >> +    int num_cbs;
> >> +    bool is_multiwrite;
> >
> > Is multiwrite logging necessary?  If event tap is called from within
> > the block layer then multiwrite is turned into one or more
> > bdrv_aio_writev() calls.
> >
> >> +static void event_tap_replay(void *opaque, int running, int reason)
> >> +{
> >> +    EventTapLog *log, *next;
> >> +
> >> +    if (!running) {
> >> +        return;
> >> +    }
> >> +
> >> +    if (event_tap_state != EVENT_TAP_LOAD) {
> >> +        return;
> >> +    }
> >> +
> >> +    event_tap_state = EVENT_TAP_REPLAY;
> >> +
> >> +    QTAILQ_FOREACH(log, &event_list, node) {
> >> +        EventTapBlkReq *blk_req;
> >> +
> >> +        /* event resume */
> >> +        switch (log->mode & ~EVENT_TAP_TYPE_MASK) {
> >> +        case EVENT_TAP_NET:
> >> +            event_tap_net_flush(&log->net_req);
> >> +            break;
> >> +        case EVENT_TAP_BLK:
> >> +            blk_req = &log->blk_req;
> >> +            if ((log->mode & EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) {
> >> +                switch (log->ioport.index) {
> >> +                case 0:
> >> +                    cpu_outb(log->ioport.address, log->ioport.data);
> >> +                    break;
> >> +                case 1:
> >> +                    cpu_outw(log->ioport.address, log->ioport.data);
> >> +                    break;
> >> +                case 2:
> >> +                    cpu_outl(log->ioport.address, log->ioport.data);
> >> +                    break;
> >> +                }
> >> +            } else {
> >> +                /* EVENT_TAP_MMIO */
> >> +                cpu_physical_memory_rw(log->mmio.address,
> >> +                                       log->mmio.buf,
> >> +                                       log->mmio.len, 1);
> >> +            }
> >> +            break;
> >
> > Why are net tx packets replayed at the net level but blk requests are
> > replayed at the pio/mmio level?
> >
> > I expected everything to replay either as pio/mmio or as net/block.
> 
> Stefan,
> 
> After doing some heavy load tests, I realized that we have to
> take a hybrid approach to replay for now.  This is because when a
> device moves to the next state (e.g. virtio decreases inuse) is
> different between net and block.  For example, virtio-net
> decreases inuse upon returning from the net layer,
> but virtio-blk
> does that inside of the callback.

For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
Both are invoked from a callback.

> If we only use pio/mmio
> replay, even though event-tap tries to replay net requests, some
> get lost because the state has proceeded already.

It seems that all you need to do to avoid this is to
delay the callback?

> This doesn't
> happen with block, because the state is still old enough to
> replay.  Note that using hybrid approach won't cause duplicated
> requests on the secondary.

An assumption devices make is that a buffer is unused once
completion callback was invoked. Does this violate that assumption?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html