2010/12/26 Michael S. Tsirkin <mst@xxxxxxxxxx>: > On Sun, Dec 26, 2010 at 07:14:44PM +0900, Yoshiaki Tamura wrote: >> 2010/12/26 Michael S. Tsirkin <mst@xxxxxxxxxx>: >> > On Fri, Dec 24, 2010 at 08:42:19PM +0900, Yoshiaki Tamura wrote: >> >> >> If qemu_aio_flush() is responsible for flushing the outstanding >> >> >> virtio-net requests, I'm wondering why it's a problem for Kemari. >> >> >> As I described in the previous message, Kemari queues the >> >> >> requests first. So in you example above, it should start with >> >> >> >> >> >> virtio-net: last_avai_idx 0 inuse 2 >> >> >> event-tap: {A,B} >> >> >> >> >> >> As you know, the requests are still in order still because net >> >> >> layer initiates in order. Not about completing. >> >> >> >> >> >> In the first synchronization, the status above is transferred. In >> >> >> the next synchronization, the status will be as following. >> >> >> >> >> >> virtio-net: last_avai_idx 1 inuse 1 >> >> >> event-tap: {B} >> >> > >> >> > OK, this answers the ordering question. >> >> >> >> Glad to hear that! >> >> >> >> > Another question: at this point we transfer this status: both >> >> > event-tap and virtio ring have the command B, >> >> > so the remote will have: >> >> > >> >> > virtio-net: inuse 0 >> >> > event-tap: {B} >> >> > >> >> > Is this right? This already seems to be a problem as when B completes >> >> > inuse will go negative? >> >> >> >> I think state above is wrong. inuse 0 means there shouldn't be >> >> any requests in event-tap. Note that the callback is called only >> >> when event-tap flushes the requests. >> >> >> >> > Next it seems that the remote virtio will resubmit B to event-tap. The >> >> > remote will then have: >> >> > >> >> > virtio-net: inuse 1 >> >> > event-tap: {B, B} >> >> > >> >> > This looks kind of wrong ... will two packets go out? >> >> >> >> No. Currently, we're just replaying the requests with pio/mmio. >> >> In the situation above, it should be, >> >> >> >> virtio-net: inuse 1 >> >> event-tap: {B} >> >> >> Why? Because Kemari flushes the first virtio-net request using >> >> >> qemu_aio_flush() before each synchronization. If >> >> >> qemu_aio_flush() doesn't guarantee the order, what you pointed >> >> >> should be problematic. So in the final synchronization, the >> >> >> state should be, >> >> >> >> >> >> virtio-net: last_avai_idx 2 inuse 0 >> >> >> event-tap: {} >> >> >> >> >> >> where A,B were completed in order. >> >> >> >> >> >> Yoshi >> >> > >> >> > >> >> > It might be better to discuss block because that's where >> >> > requests can complete out of order. >> >> >> >> It's same as net. We queue requests and call bdrv_flush per >> >> sending requests to the block. So there shouldn't be any >> >> inversion. >> >> >> >> > So let me see if I understand: >> >> > - each command passed to event tap is queued by it, >> >> > it is not passed directly to the backend >> >> > - later requests are passed to the backend, >> >> > always in the same order that they were submitted >> >> > - each synchronization point flushes all requests >> >> > passed to the backend so far >> >> > - each synchronization transfers all requests not passed to the backend, >> >> > to the remote, and they are replayed there >> >> >> >> Correct. >> >> >> >> > Now to analyse this for correctness I am looking at the original patch >> >> > because it is smaller so easier to analyse and I think it is >> >> > functionally equivalent, correct me if I am wrong in this. >> >> >> >> So you think decreasing last_avail_idx upon save is better than >> >> updating it in the callback? >> >> >> >> > So the reason there's no out of order issue is this >> >> > (and might be a good thing to put in commit log >> >> > or a comment somewhere): >> >> >> >> I've done some in the latest patch. Please point it out if it >> >> wasn't enough. >> >> >> >> > At point of save callback event tap has flushed commands >> >> > passed to the backend already. Thus at the point of >> >> > the save callback if a command has completed >> >> > all previous commands have been flushed and completed. >> >> > >> >> > >> >> > Therefore inuse is >> >> > in fact the # of requests passed to event tap but not yet >> >> > passed to the backend (for non-event tap case all commands are >> >> > passed to the backend immediately and because of this >> >> > inuse is 0) and these are the last inuse commands submitted. >> >> > >> >> > >> >> > Right? >> >> >> >> Yep. >> >> >> >> > Now a question: >> >> > >> >> > When we pass last_used_index - inuse to the remote, >> >> > the remote virtio will resubmit the request. >> >> > Since request is also passed by event tap, we get >> >> > the request twice, why is this not a problem? >> >> >> >> It's not a problem because event-tap currently replays with >> >> pio/mmio only, as I mentioned above. Although event-tap receives >> >> information about the queued requests, it won't pass it to the >> >> backend. The reason is the problem in setting the callbacks >> >> which are specific to devices on the secondary. These are >> >> pointers, and even worse, are usually static functions, which >> >> event-tap has no way to restore it upon failover. I do want to >> >> change event-tap replay to be this way in the future, pio/mmio >> >> replay is implemented for now. >> >> >> >> Thanks, >> >> >> >> Yoshi >> >> >> > >> > Then I am still confused, sorry. inuse != 0 means that some requests >> > were passed to the backend but did not complete. I think that if you do >> > a flush, this waits until all requests passed to the backend will >> > complete. Why does not this guarantee inuse = 0 on the origin at the >> > synchronization point? >> >> The synchronization is done before event-tap releases requests to >> the backend, so there are two types of flush: event-tap and >> backend block/net. I assume you're confused with the fact that >> flushing backend with qemu_aio_flush/bdrv_flush doesn't necessary >> decrease inuse if event-tap has queued requests because there are >> no requests passed to the backend. Let me do a case study again. >> >> virtio: inuse 4 >> event-tap: {A,B,C} >> backend: {D} >> > > > There are two event-tap devices, right? > PIO one is above virtio, AIO one is between virtio and backend > (e.g. bdrv)? Which one is meant here? Right. I'm mentioning about the latter, between virtio and backend. Note that event-tap function in pio/mmio doesn't queue but just records what initiated the requests. Yoshi > > >> synchronization starts. backend gets flushed. >> >> virtio: inuse 3 >> event-tap: {A,B,C} >> backend: {} >> synchronization gets done. >> # secondary is virtio: inuse 3 >> >> event-tap flushes one request. >> >> virtio: inuse 2 >> event-tap: {B,C} >> backend: {} >> repeats above and finally it should be, >> >> virtio: inuse 0 >> event-tap: {} >> >> Hope this helps. >> >> Yoshi >> >> > >> > -- >> > MST >> > >> > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html