On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote: > 2010/11/29 Stefan Hajnoczi <stefanha@xxxxxxxxx>: > > On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura > > <tamura.yoshiaki@xxxxxxxxxxxxx> wrote: > >> event-tap controls when to start FT transaction, and provides proxy > >> functions to called from net/block devices. While FT transaction, it > >> queues up net/block requests, and flush them when the transaction gets > >> completed. > >> > >> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@xxxxxxxxxxxxx> > >> Signed-off-by: OHMURA Kei <ohmura.kei@xxxxxxxxxxxxx> > >> --- > >> Makefile.target | 1 + > >> block.h | 9 + > >> event-tap.c | 794 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> event-tap.h | 34 +++ > >> net.h | 4 + > >> net/queue.c | 1 + > >> 6 files changed, 843 insertions(+), 0 deletions(-) > >> create mode 100644 event-tap.c > >> create mode 100644 event-tap.h > > > > event_tap_state is checked at the beginning of several functions. If > > there is an unexpected state the function silently returns. Should > > these checks really be assert() so there is an abort and backtrace if > > the program ever reaches this state? > > > >> +typedef struct EventTapBlkReq { > >> + char *device_name; > >> + int num_reqs; > >> + int num_cbs; > >> + bool is_multiwrite; > > > > Is multiwrite logging necessary? If event tap is called from within > > the block layer then multiwrite is turned into one or more > > bdrv_aio_writev() calls. > > > >> +static void event_tap_replay(void *opaque, int running, int reason) > >> +{ > >> + EventTapLog *log, *next; > >> + > >> + if (!running) { > >> + return; > >> + } > >> + > >> + if (event_tap_state != EVENT_TAP_LOAD) { > >> + return; > >> + } > >> + > >> + event_tap_state = EVENT_TAP_REPLAY; > >> + > >> + QTAILQ_FOREACH(log, &event_list, node) { > >> + EventTapBlkReq *blk_req; > >> + > >> + /* event resume */ > >> + switch (log->mode & ~EVENT_TAP_TYPE_MASK) { > >> + case EVENT_TAP_NET: > >> + event_tap_net_flush(&log->net_req); > >> + break; > >> + case EVENT_TAP_BLK: > >> + blk_req = &log->blk_req; > >> + if ((log->mode & EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { > >> + switch (log->ioport.index) { > >> + case 0: > >> + cpu_outb(log->ioport.address, log->ioport.data); > >> + break; > >> + case 1: > >> + cpu_outw(log->ioport.address, log->ioport.data); > >> + break; > >> + case 2: > >> + cpu_outl(log->ioport.address, log->ioport.data); > >> + break; > >> + } > >> + } else { > >> + /* EVENT_TAP_MMIO */ > >> + cpu_physical_memory_rw(log->mmio.address, > >> + log->mmio.buf, > >> + log->mmio.len, 1); > >> + } > >> + break; > > > > Why are net tx packets replayed at the net level but blk requests are > > replayed at the pio/mmio level? > > > > I expected everything to replay either as pio/mmio or as net/block. > > Stefan, > > After doing some heavy load tests, I realized that we have to > take a hybrid approach to replay for now. This is because when a > device moves to the next state (e.g. virtio decreases inuse) is > different between net and block. For example, virtio-net > decreases inuse upon returning from the net layer, > but virtio-blk > does that inside of the callback. For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete. For RX, virtio-net calls virtqueue_flush from virtio_net_receive. Both are invoked from a callback. > If we only use pio/mmio > replay, even though event-tap tries to replay net requests, some > get lost because the state has proceeded already. It seems that all you need to do to avoid this is to delay the callback? > This doesn't > happen with block, because the state is still old enough to > replay. Note that using hybrid approach won't cause duplicated > requests on the secondary. An assumption devices make is that a buffer is unused once completion callback was invoked. Does this violate that assumption? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html