On Sun, Dec 26, 2010 at 07:14:44PM +0900, Yoshiaki Tamura wrote: > 2010/12/26 Michael S. Tsirkin <mst@xxxxxxxxxx>: > > On Fri, Dec 24, 2010 at 08:42:19PM +0900, Yoshiaki Tamura wrote: > >> >> If qemu_aio_flush() is responsible for flushing the outstanding > >> >> virtio-net requests, I'm wondering why it's a problem for Kemari. > >> >> As I described in the previous message, Kemari queues the > >> >> requests first. So in you example above, it should start with > >> >> > >> >> virtio-net: last_avai_idx 0 inuse 2 > >> >> event-tap: {A,B} > >> >> > >> >> As you know, the requests are still in order still because net > >> >> layer initiates in order. Not about completing. > >> >> > >> >> In the first synchronization, the status above is transferred. In > >> >> the next synchronization, the status will be as following. > >> >> > >> >> virtio-net: last_avai_idx 1 inuse 1 > >> >> event-tap: {B} > >> > > >> > OK, this answers the ordering question. > >> > >> Glad to hear that! > >> > >> > Another question: at this point we transfer this status: both > >> > event-tap and virtio ring have the command B, > >> > so the remote will have: > >> > > >> > virtio-net: inuse 0 > >> > event-tap: {B} > >> > > >> > Is this right? This already seems to be a problem as when B completes > >> > inuse will go negative? > >> > >> I think state above is wrong. inuse 0 means there shouldn't be > >> any requests in event-tap. Note that the callback is called only > >> when event-tap flushes the requests. > >> > >> > Next it seems that the remote virtio will resubmit B to event-tap. The > >> > remote will then have: > >> > > >> > virtio-net: inuse 1 > >> > event-tap: {B, B} > >> > > >> > This looks kind of wrong ... will two packets go out? > >> > >> No. Currently, we're just replaying the requests with pio/mmio. > >> In the situation above, it should be, > >> > >> virtio-net: inuse 1 > >> event-tap: {B} > >> >> Why? Because Kemari flushes the first virtio-net request using > >> >> qemu_aio_flush() before each synchronization. If > >> >> qemu_aio_flush() doesn't guarantee the order, what you pointed > >> >> should be problematic. So in the final synchronization, the > >> >> state should be, > >> >> > >> >> virtio-net: last_avai_idx 2 inuse 0 > >> >> event-tap: {} > >> >> > >> >> where A,B were completed in order. > >> >> > >> >> Yoshi > >> > > >> > > >> > It might be better to discuss block because that's where > >> > requests can complete out of order. > >> > >> It's same as net. We queue requests and call bdrv_flush per > >> sending requests to the block. So there shouldn't be any > >> inversion. > >> > >> > So let me see if I understand: > >> > - each command passed to event tap is queued by it, > >> > it is not passed directly to the backend > >> > - later requests are passed to the backend, > >> > always in the same order that they were submitted > >> > - each synchronization point flushes all requests > >> > passed to the backend so far > >> > - each synchronization transfers all requests not passed to the backend, > >> > to the remote, and they are replayed there > >> > >> Correct. > >> > >> > Now to analyse this for correctness I am looking at the original patch > >> > because it is smaller so easier to analyse and I think it is > >> > functionally equivalent, correct me if I am wrong in this. > >> > >> So you think decreasing last_avail_idx upon save is better than > >> updating it in the callback? > >> > >> > So the reason there's no out of order issue is this > >> > (and might be a good thing to put in commit log > >> > or a comment somewhere): > >> > >> I've done some in the latest patch. Please point it out if it > >> wasn't enough. > >> > >> > At point of save callback event tap has flushed commands > >> > passed to the backend already. Thus at the point of > >> > the save callback if a command has completed > >> > all previous commands have been flushed and completed. > >> > > >> > > >> > Therefore inuse is > >> > in fact the # of requests passed to event tap but not yet > >> > passed to the backend (for non-event tap case all commands are > >> > passed to the backend immediately and because of this > >> > inuse is 0) and these are the last inuse commands submitted. > >> > > >> > > >> > Right? > >> > >> Yep. > >> > >> > Now a question: > >> > > >> > When we pass last_used_index - inuse to the remote, > >> > the remote virtio will resubmit the request. > >> > Since request is also passed by event tap, we get > >> > the request twice, why is this not a problem? > >> > >> It's not a problem because event-tap currently replays with > >> pio/mmio only, as I mentioned above. Although event-tap receives > >> information about the queued requests, it won't pass it to the > >> backend. The reason is the problem in setting the callbacks > >> which are specific to devices on the secondary. These are > >> pointers, and even worse, are usually static functions, which > >> event-tap has no way to restore it upon failover. I do want to > >> change event-tap replay to be this way in the future, pio/mmio > >> replay is implemented for now. > >> > >> Thanks, > >> > >> Yoshi > >> > > > > Then I am still confused, sorry. inuse != 0 means that some requests > > were passed to the backend but did not complete. I think that if you do > > a flush, this waits until all requests passed to the backend will > > complete. Why does not this guarantee inuse = 0 on the origin at the > > synchronization point? > > The synchronization is done before event-tap releases requests to > the backend, so there are two types of flush: event-tap and > backend block/net. I assume you're confused with the fact that > flushing backend with qemu_aio_flush/bdrv_flush doesn't necessary > decrease inuse if event-tap has queued requests because there are > no requests passed to the backend. Let me do a case study again. > > virtio: inuse 4 > event-tap: {A,B,C} > backend: {D} > There are two event-tap devices, right? PIO one is above virtio, AIO one is between virtio and backend (e.g. bdrv)? Which one is meant here? > synchronization starts. backend gets flushed. > > virtio: inuse 3 > event-tap: {A,B,C} > backend: {} > synchronization gets done. > # secondary is virtio: inuse 3 > > event-tap flushes one request. > > virtio: inuse 2 > event-tap: {B,C} > backend: {} > repeats above and finally it should be, > > virtio: inuse 0 > event-tap: {} > > Hope this helps. > > Yoshi > > > > > -- > > MST > > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html