2010/12/26 Michael S. Tsirkin <mst@xxxxxxxxxx>: > On Fri, Dec 24, 2010 at 08:42:19PM +0900, Yoshiaki Tamura wrote: >> >> If qemu_aio_flush() is responsible for flushing the outstanding >> >> virtio-net requests, I'm wondering why it's a problem for Kemari. >> >> As I described in the previous message, Kemari queues the >> >> requests first. So in you example above, it should start with >> >> >> >> virtio-net: last_avai_idx 0 inuse 2 >> >> event-tap: {A,B} >> >> >> >> As you know, the requests are still in order still because net >> >> layer initiates in order. Not about completing. >> >> >> >> In the first synchronization, the status above is transferred. In >> >> the next synchronization, the status will be as following. >> >> >> >> virtio-net: last_avai_idx 1 inuse 1 >> >> event-tap: {B} >> > >> > OK, this answers the ordering question. >> >> Glad to hear that! >> >> > Another question: at this point we transfer this status: both >> > event-tap and virtio ring have the command B, >> > so the remote will have: >> > >> > virtio-net: inuse 0 >> > event-tap: {B} >> > >> > Is this right? This already seems to be a problem as when B completes >> > inuse will go negative? >> >> I think state above is wrong. inuse 0 means there shouldn't be >> any requests in event-tap. Note that the callback is called only >> when event-tap flushes the requests. >> >> > Next it seems that the remote virtio will resubmit B to event-tap. The >> > remote will then have: >> > >> > virtio-net: inuse 1 >> > event-tap: {B, B} >> > >> > This looks kind of wrong ... will two packets go out? >> >> No. Currently, we're just replaying the requests with pio/mmio. >> In the situation above, it should be, >> >> virtio-net: inuse 1 >> event-tap: {B} >> >> Why? Because Kemari flushes the first virtio-net request using >> >> qemu_aio_flush() before each synchronization. If >> >> qemu_aio_flush() doesn't guarantee the order, what you pointed >> >> should be problematic. So in the final synchronization, the >> >> state should be, >> >> >> >> virtio-net: last_avai_idx 2 inuse 0 >> >> event-tap: {} >> >> >> >> where A,B were completed in order. >> >> >> >> Yoshi >> > >> > >> > It might be better to discuss block because that's where >> > requests can complete out of order. >> >> It's same as net. We queue requests and call bdrv_flush per >> sending requests to the block. So there shouldn't be any >> inversion. >> >> > So let me see if I understand: >> > - each command passed to event tap is queued by it, >> > it is not passed directly to the backend >> > - later requests are passed to the backend, >> > always in the same order that they were submitted >> > - each synchronization point flushes all requests >> > passed to the backend so far >> > - each synchronization transfers all requests not passed to the backend, >> > to the remote, and they are replayed there >> >> Correct. >> >> > Now to analyse this for correctness I am looking at the original patch >> > because it is smaller so easier to analyse and I think it is >> > functionally equivalent, correct me if I am wrong in this. >> >> So you think decreasing last_avail_idx upon save is better than >> updating it in the callback? >> >> > So the reason there's no out of order issue is this >> > (and might be a good thing to put in commit log >> > or a comment somewhere): >> >> I've done some in the latest patch. Please point it out if it >> wasn't enough. >> >> > At point of save callback event tap has flushed commands >> > passed to the backend already. Thus at the point of >> > the save callback if a command has completed >> > all previous commands have been flushed and completed. >> > >> > >> > Therefore inuse is >> > in fact the # of requests passed to event tap but not yet >> > passed to the backend (for non-event tap case all commands are >> > passed to the backend immediately and because of this >> > inuse is 0) and these are the last inuse commands submitted. >> > >> > >> > Right? >> >> Yep. >> >> > Now a question: >> > >> > When we pass last_used_index - inuse to the remote, >> > the remote virtio will resubmit the request. >> > Since request is also passed by event tap, we get >> > the request twice, why is this not a problem? >> >> It's not a problem because event-tap currently replays with >> pio/mmio only, as I mentioned above. Although event-tap receives >> information about the queued requests, it won't pass it to the >> backend. The reason is the problem in setting the callbacks >> which are specific to devices on the secondary. These are >> pointers, and even worse, are usually static functions, which >> event-tap has no way to restore it upon failover. I do want to >> change event-tap replay to be this way in the future, pio/mmio >> replay is implemented for now. >> >> Thanks, >> >> Yoshi >> > > Then I am still confused, sorry. inuse != 0 means that some requests > were passed to the backend but did not complete. I think that if you do > a flush, this waits until all requests passed to the backend will > complete. Why does not this guarantee inuse = 0 on the origin at the > synchronization point? The synchronization is done before event-tap releases requests to the backend, so there are two types of flush: event-tap and backend block/net. I assume you're confused with the fact that flushing backend with qemu_aio_flush/bdrv_flush doesn't necessary decrease inuse if event-tap has queued requests because there are no requests passed to the backend. Let me do a case study again. virtio: inuse 4 event-tap: {A,B,C} backend: {D} synchronization starts. backend gets flushed. virtio: inuse 3 event-tap: {A,B,C} backend: {} synchronization gets done. # secondary is virtio: inuse 3 event-tap flushes one request. virtio: inuse 2 event-tap: {B,C} backend: {} repeats above and finally it should be, virtio: inuse 0 event-tap: {} Hope this helps. Yoshi > > -- > MST > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html