2011/1/6 Michael S. Tsirkin <mst@xxxxxxxxxx>: > On Thu, Jan 06, 2011 at 05:47:27PM +0900, Yoshiaki Tamura wrote: >> 2011/1/4 Michael S. Tsirkin <mst@xxxxxxxxxx>: >> > On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote: >> >> 2011/1/4 Michael S. Tsirkin <mst@xxxxxxxxxx>: >> >> > On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote: >> >> >> 2011/1/4 Michael S. Tsirkin <mst@xxxxxxxxxx>: >> >> >> > On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote: >> >> >> >> 2010/11/29 Stefan Hajnoczi <stefanha@xxxxxxxxx>: >> >> >> >> > On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura >> >> >> >> > <tamura.yoshiaki@xxxxxxxxxxxxx> wrote: >> >> >> >> >> event-tap controls when to start FT transaction, and provides proxy >> >> >> >> >> functions to called from net/block devices. While FT transaction, it >> >> >> >> >> queues up net/block requests, and flush them when the transaction gets >> >> >> >> >> completed. >> >> >> >> >> >> >> >> >> >> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@xxxxxxxxxxxxx> >> >> >> >> >> Signed-off-by: OHMURA Kei <ohmura.kei@xxxxxxxxxxxxx> >> >> >> >> >> --- >> >> >> >> >> Makefile.target | 1 + >> >> >> >> >> block.h | 9 + >> >> >> >> >> event-tap.c | 794 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> event-tap.h | 34 +++ >> >> >> >> >> net.h | 4 + >> >> >> >> >> net/queue.c | 1 + >> >> >> >> >> 6 files changed, 843 insertions(+), 0 deletions(-) >> >> >> >> >> create mode 100644 event-tap.c >> >> >> >> >> create mode 100644 event-tap.h >> >> >> >> > >> >> >> >> > event_tap_state is checked at the beginning of several functions. If >> >> >> >> > there is an unexpected state the function silently returns. Should >> >> >> >> > these checks really be assert() so there is an abort and backtrace if >> >> >> >> > the program ever reaches this state? >> >> >> >> > >> >> >> >> >> +typedef struct EventTapBlkReq { >> >> >> >> >> + char *device_name; >> >> >> >> >> + int num_reqs; >> >> >> >> >> + int num_cbs; >> >> >> >> >> + bool is_multiwrite; >> >> >> >> > >> >> >> >> > Is multiwrite logging necessary? If event tap is called from within >> >> >> >> > the block layer then multiwrite is turned into one or more >> >> >> >> > bdrv_aio_writev() calls. >> >> >> >> > >> >> >> >> >> +static void event_tap_replay(void *opaque, int running, int reason) >> >> >> >> >> +{ >> >> >> >> >> + EventTapLog *log, *next; >> >> >> >> >> + >> >> >> >> >> + if (!running) { >> >> >> >> >> + return; >> >> >> >> >> + } >> >> >> >> >> + >> >> >> >> >> + if (event_tap_state != EVENT_TAP_LOAD) { >> >> >> >> >> + return; >> >> >> >> >> + } >> >> >> >> >> + >> >> >> >> >> + event_tap_state = EVENT_TAP_REPLAY; >> >> >> >> >> + >> >> >> >> >> + QTAILQ_FOREACH(log, &event_list, node) { >> >> >> >> >> + EventTapBlkReq *blk_req; >> >> >> >> >> + >> >> >> >> >> + /* event resume */ >> >> >> >> >> + switch (log->mode & ~EVENT_TAP_TYPE_MASK) { >> >> >> >> >> + case EVENT_TAP_NET: >> >> >> >> >> + event_tap_net_flush(&log->net_req); >> >> >> >> >> + break; >> >> >> >> >> + case EVENT_TAP_BLK: >> >> >> >> >> + blk_req = &log->blk_req; >> >> >> >> >> + if ((log->mode & EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { >> >> >> >> >> + switch (log->ioport.index) { >> >> >> >> >> + case 0: >> >> >> >> >> + cpu_outb(log->ioport.address, log->ioport.data); >> >> >> >> >> + break; >> >> >> >> >> + case 1: >> >> >> >> >> + cpu_outw(log->ioport.address, log->ioport.data); >> >> >> >> >> + break; >> >> >> >> >> + case 2: >> >> >> >> >> + cpu_outl(log->ioport.address, log->ioport.data); >> >> >> >> >> + break; >> >> >> >> >> + } >> >> >> >> >> + } else { >> >> >> >> >> + /* EVENT_TAP_MMIO */ >> >> >> >> >> + cpu_physical_memory_rw(log->mmio.address, >> >> >> >> >> + log->mmio.buf, >> >> >> >> >> + log->mmio.len, 1); >> >> >> >> >> + } >> >> >> >> >> + break; >> >> >> >> > >> >> >> >> > Why are net tx packets replayed at the net level but blk requests are >> >> >> >> > replayed at the pio/mmio level? >> >> >> >> > >> >> >> >> > I expected everything to replay either as pio/mmio or as net/block. >> >> >> >> >> >> >> >> Stefan, >> >> >> >> >> >> >> >> After doing some heavy load tests, I realized that we have to >> >> >> >> take a hybrid approach to replay for now. This is because when a >> >> >> >> device moves to the next state (e.g. virtio decreases inuse) is >> >> >> >> different between net and block. For example, virtio-net >> >> >> >> decreases inuse upon returning from the net layer, >> >> >> >> but virtio-blk >> >> >> >> does that inside of the callback. >> >> >> > >> >> >> > For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete. >> >> >> > For RX, virtio-net calls virtqueue_flush from virtio_net_receive. >> >> >> > Both are invoked from a callback. >> >> >> > >> >> >> >> If we only use pio/mmio >> >> >> >> replay, even though event-tap tries to replay net requests, some >> >> >> >> get lost because the state has proceeded already. >> >> >> > >> >> >> > It seems that all you need to do to avoid this is to >> >> >> > delay the callback? >> >> >> >> >> >> Yeah, if it's possible. But if you take a look at virtio-net, >> >> >> you'll see that virtio_push is called immediately after calling >> >> >> qemu_sendv_packet >> >> >> while virtio-blk does that in the callback. >> >> > >> >> > This is only if the packet was sent immediately. >> >> > I was referring to the case where the packet is queued. >> >> >> >> I see. I usually don't see packets get queued in the net layer. >> >> What would be the effect to devices? Restraint sending packets? >> > >> > Yes. >> > >> >> > >> >> >> > >> >> >> >> This doesn't >> >> >> >> happen with block, because the state is still old enough to >> >> >> >> replay. Note that using hybrid approach won't cause duplicated >> >> >> >> requests on the secondary. >> >> >> > >> >> >> > An assumption devices make is that a buffer is unused once >> >> >> > completion callback was invoked. Does this violate that assumption? >> >> >> >> >> >> No, it shouldn't. In case of net with net layer replay, we copy >> >> >> the content of the requests, and in case of block, because we >> >> >> haven't called the callback yet, the requests remains fresh. >> >> >> >> >> >> Yoshi >> >> >> >> >> > >> >> > Yes, as long as you copy it should be fine. Maybe it's a good idea for >> >> > event-tap to queue all packets to avoid the copy and avoid the need to >> >> > replay at the net level. >> >> >> >> If queuing works fine for the devices, it seems to be a good >> >> idea. I think the ordering issue doesn't happen still. >> >> >> >> Yoshi >> > >> > If you replay and both net and pio level, it becomes complex. >> > Maybe it's ok, but certainly harder to reason about. >> >> Michael, >> >> It seems queuing at event-tap like in net layer works for devices >> that use qemu_send_packet_async as you suggested. But for those >> that use qemu_send_packet, we still need to copy the contents >> just like net layer queuing does, and net level replay should be >> kept to handle it. >> Thanks, >> >> Yoshi > > Right. And I think it's fine. What I found confusing was > where both virtio (because avail idx is moved back) and > the net layer replay the packet. I agree, and that part is fixed. There won't be double layer replay for the same device. Yoshi > > >> > >> >> > >> >> >> > >> >> >> > -- >> >> >> > MST >> >> >> > >> >> >> > >> >> > >> >> > >> > >> > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html