On Fri, Dec 24, 2010 at 08:42:19PM +0900, Yoshiaki Tamura wrote: > >> If qemu_aio_flush() is responsible for flushing the outstanding > >> virtio-net requests, I'm wondering why it's a problem for Kemari. > >> As I described in the previous message, Kemari queues the > >> requests first. So in you example above, it should start with > >> > >> virtio-net: last_avai_idx 0 inuse 2 > >> event-tap: {A,B} > >> > >> As you know, the requests are still in order still because net > >> layer initiates in order. Not about completing. > >> > >> In the first synchronization, the status above is transferred. In > >> the next synchronization, the status will be as following. > >> > >> virtio-net: last_avai_idx 1 inuse 1 > >> event-tap: {B} > > > > OK, this answers the ordering question. > > Glad to hear that! > > > Another question: at this point we transfer this status: both > > event-tap and virtio ring have the command B, > > so the remote will have: > > > > virtio-net: inuse 0 > > event-tap: {B} > > > > Is this right? This already seems to be a problem as when B completes > > inuse will go negative? > > I think state above is wrong. inuse 0 means there shouldn't be > any requests in event-tap. Note that the callback is called only > when event-tap flushes the requests. > > > Next it seems that the remote virtio will resubmit B to event-tap. The > > remote will then have: > > > > virtio-net: inuse 1 > > event-tap: {B, B} > > > > This looks kind of wrong ... will two packets go out? > > No. Currently, we're just replaying the requests with pio/mmio. > In the situation above, it should be, > > virtio-net: inuse 1 > event-tap: {B} > >> Why? Because Kemari flushes the first virtio-net request using > >> qemu_aio_flush() before each synchronization. If > >> qemu_aio_flush() doesn't guarantee the order, what you pointed > >> should be problematic. So in the final synchronization, the > >> state should be, > >> > >> virtio-net: last_avai_idx 2 inuse 0 > >> event-tap: {} > >> > >> where A,B were completed in order. > >> > >> Yoshi > > > > > > It might be better to discuss block because that's where > > requests can complete out of order. > > It's same as net. We queue requests and call bdrv_flush per > sending requests to the block. So there shouldn't be any > inversion. > > > So let me see if I understand: > > - each command passed to event tap is queued by it, > > it is not passed directly to the backend > > - later requests are passed to the backend, > > always in the same order that they were submitted > > - each synchronization point flushes all requests > > passed to the backend so far > > - each synchronization transfers all requests not passed to the backend, > > to the remote, and they are replayed there > > Correct. > > > Now to analyse this for correctness I am looking at the original patch > > because it is smaller so easier to analyse and I think it is > > functionally equivalent, correct me if I am wrong in this. > > So you think decreasing last_avail_idx upon save is better than > updating it in the callback? > > > So the reason there's no out of order issue is this > > (and might be a good thing to put in commit log > > or a comment somewhere): > > I've done some in the latest patch. Please point it out if it > wasn't enough. > > > At point of save callback event tap has flushed commands > > passed to the backend already. Thus at the point of > > the save callback if a command has completed > > all previous commands have been flushed and completed. > > > > > > Therefore inuse is > > in fact the # of requests passed to event tap but not yet > > passed to the backend (for non-event tap case all commands are > > passed to the backend immediately and because of this > > inuse is 0) and these are the last inuse commands submitted. > > > > > > Right? > > Yep. > > > Now a question: > > > > When we pass last_used_index - inuse to the remote, > > the remote virtio will resubmit the request. > > Since request is also passed by event tap, we get > > the request twice, why is this not a problem? > > It's not a problem because event-tap currently replays with > pio/mmio only, as I mentioned above. Although event-tap receives > information about the queued requests, it won't pass it to the > backend. The reason is the problem in setting the callbacks > which are specific to devices on the secondary. These are > pointers, and even worse, are usually static functions, which > event-tap has no way to restore it upon failover. I do want to > change event-tap replay to be this way in the future, pio/mmio > replay is implemented for now. > > Thanks, > > Yoshi > Then I am still confused, sorry. inuse != 0 means that some requests were passed to the backend but did not complete. I think that if you do a flush, this waits until all requests passed to the backend will complete. Why does not this guarantee inuse = 0 on the origin at the synchronization point? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html