On Nov 21, 2013, at 3:47 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > On Wed, Nov 20, 2013 at 10:20:09PM +0800, Xiao Guangrong wrote: >>> But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus are >>> executing? >> >> Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be generated >> when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages the vcpus >> should be stopped. >> >>> >>> With fast page fault: >>> >>> if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte) >>> /* The window in here... */ >>> mark_page_dirty(vcpu->kvm, gfn); >>> >>> And the $SUBJECT set_spte reordering, the rule becomes >>> >>> A call to GET_DIRTY_LOG guarantees to return correct information about >>> dirty pages before invocation of the previous GET_DIRTY_LOG call. >>> >>> (see example 1: the next GET_DIRTY_LOG will return the dirty information >>> there). >>> >> >> It seems no. >> >> The first GET_DIRTY_LOG can happen before fast-page-fault, >> the second GET_DIRTY_LOG happens in the window between cmpxchg() >> and mark_page_dirty(), for the second one, the information is still “incorrect”. > > Its correct for the previous GET_DIRTY_LOG call. Oh, yes. > >>> The rule for sptes that is, because kvm_write_guest does not match the >>> documentation at all. >> >> You mean the case of “kvm_write_guest” is valid (I do not know why it is)? >> Or anything else? >> >>> >>> So before example 1 and this patch, the rule (well for sptes at least) was >>> >>> "Given a memory slot, return a bitmap containing any pages dirtied >>> since the last call to this ioctl. Bit 0 is the first page in the >>> memory slot. Ensure the entire structure is cleared to avoid padding >>> issues." >>> >>> Can you explain why it is OK to relax this rule? >> >> It’s because: >> 1) it doesn’t break current use cases, i.e. Live migration and FB-flushing. >> 2) the current code, like kvm_write_guest has already broken the documentation >> (the guest page has been written but missed in the dirty bitmap). >> 3) it’s needless to implement a exact get-dirty-pages since the dirty pages can >> no be exactly got except stopping vcpus. >> >> So i think we'd document this case instead. No? > > Lets figure out the requirements, then. I don't understand why > FB-flushing is safe (think kvm-autotest: one pixel off the entire > test fails). I did not read FB-flushing code, i guess the reason why it can work is: FB-flushing do periodicity get-dirty-page and flush it. After guest writes the page, the page will be flushed in the next GET_DIRTY_LOG. > > Before fast page fault: Pages are either write protected or the > corresponding dirty bitmap bit is set. Any write faults to dirty logged > sptes while GET_DIRTY log is executing in the protected section are > allowed to instantiate writeable spte after GET_DIRTY log is finished. > > After fast page fault: Pages can be writeable and the dirty bitmap not > set. Therefore data can be dirty before GET_DIRTY executes and still > fail to be returned in the bitmap. It’s right. The current GET_DIRTY fail to get the dirty page but the next GET_DIRTY can get it properly since the current GET_DIRTY need to flush all TLBs that waits for fast-page-fault finish. I do not think it’s a big problem since single GET_DIRTY is useless as i explained in the previous mail. > > Since this patchset does not introduce change in behaviour (fast pf > did), no reason to avoid merging this. Yes, thank you, Marcelo! :) > > BTW, since GET_DIRTY log does not require to report concurrent (to > GET_DIRTY_LOG) write faults, it is not necessary to rewalk the spte > list, is it? You mean the “rewalk” we introduced in pte_list_walk_lockless() in this patchset? I think this rewalk is needed because it’s caused by meet a unexpected nulls that means we’re walking on the unexpected rmap. If we do not do this, some writable sptes will be missed. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html