On Fri, Nov 15, 2013 at 03:09:13PM +0800, Xiao Guangrong wrote: > On 11/15/2013 02:39 AM, Marcelo Tosatti wrote: > > On Thu, Nov 14, 2013 at 01:15:24PM +0800, Xiao Guangrong wrote: > >> > >> Hi Marcelo, > >> > >> On 11/14/2013 08:36 AM, Marcelo Tosatti wrote: > >> > >>> > >>> Any code location which reads the writable bit in the spte and assumes if its not > >>> set, that the translation which the spte refers to is not cached in a > >>> remote CPU's TLB can become buggy. (*) > >>> > >>> It might be the case that now its not an issue, but its so subtle that > >>> it should be improved. > >>> > >>> Can you add a fat comment on top of is_writeable_bit describing this? > >>> (and explain why is_writable_pte users do not make an assumption > >>> about (*). > >>> > >>> "Writeable bit of locklessly modifiable sptes might be cleared > >>> but TLBs not flushed: so whenever reading locklessly modifiable sptes > >>> you cannot assume TLBs are flushed". > >>> > >>> For example this one is unclear: > >>> > >>> if (!can_unsync && is_writable_pte(*sptep)) > >>> goto set_pte; > >>> And: > >>> > >>> if (!is_writable_pte(spte) && > >>> !(pt_protect && spte_is_locklessly_modifiable(spte))) > >>> return false; > >>> > >>> This is safe because get_dirty_log/kvm_mmu_slot_remove_write_access are > >>> serialized by a single mutex (if there were two mutexes, it would not be > >>> safe). Can you add an assert to both > >>> kvm_mmu_slot_remove_write_access/kvm_vm_ioctl_get_dirty_log > >>> for (slots_lock) is locked, and explain? > >>> > >>> So just improve the comments please, thanks (no need to resend whole > >>> series). > >> > >> Thank you very much for your time to review it and really appreciate > >> for you detailed the issue so clearly to me. > >> > >> I will do it on the top of this patchset or after it is merged > >> (if it's possiable). > > > > Ok, can you explain why every individual caller of is_writable_pte have > > no such assumption now? (the one mentioned above is not clear to me for > > example, should explain all of them). > > Okay. OK, thanks for checking. > Generally speak, we 1) needn't care readonly spte too much since it > can not be locklessly write-protected and 2) if is_writable_pte() is used > to check mmu-mode's state we can check SPTE_MMU_WRITEABLE instead. > > There are the places is_writable_pte is used: > 1) in spte_has_volatile_bits(): > 527 static bool spte_has_volatile_bits(u64 spte) > 528 { > 529 /* > 530 * Always atomicly update spte if it can be updated > 531 * out of mmu-lock, it can ensure dirty bit is not lost, > 532 * also, it can help us to get a stable is_writable_pte() > 533 * to ensure tlb flush is not missed. > 534 */ > 535 if (spte_is_locklessly_modifiable(spte)) > 536 return true; > 537 > 538 if (!shadow_accessed_mask) > 539 return false; > 540 > 541 if (!is_shadow_present_pte(spte)) > 542 return false; > 543 > 544 if ((spte & shadow_accessed_mask) && > 545 (!is_writable_pte(spte) || (spte & shadow_dirty_mask))) > 546 return false; > 547 > 548 return true; > 549 } > > this path is not broken since any spte can be lockless modifiable will do > lockless update (will always return 'true' in the line 536). > > 2): in mmu_spte_update() > 594 /* > 595 * For the spte updated out of mmu-lock is safe, since > 596 * we always atomicly update it, see the comments in > 597 * spte_has_volatile_bits(). > 598 */ > 599 if (spte_is_locklessly_modifiable(old_spte) && > 600 !is_writable_pte(new_spte)) > 601 ret = true; > > The new_spte is a temp value that can not be fetched by lockless > write-protection and !is_writable_pte() is stable enough (can not be > locklessly write-protected). > > 3) in spte_write_protect() > 1368 if (!is_writable_pte(spte) && > 1369 !spte_is_locklessly_modifiable(spte)) > 1370 return false; > 1371 > > It always do write-protection if the spte is lockelss modifiable. > (This code is the aspect after applying the whole pachset, the code is safe too > before patch "[PATCH v3 14/15] KVM: MMU: clean up spte_write_protect" since > the lockless write-protection path is serialized by a single lock.). > > 4) in set_spte() > 2690 /* > 2691 * Optimization: for pte sync, if spte was writable the hash > 2692 * lookup is unnecessary (and expensive). Write protection > 2693 * is responsibility of mmu_get_page / kvm_sync_page. > 2694 * Same reasoning can be applied to dirty page accounting. > 2695 */ > 2696 if (!can_unsync && is_writable_pte(*sptep)) > 2697 goto set_pte; > > It is used for a optimization and the worst case is the optimization is disabled > (walking the shadow pages in the hast table) when the spte has been locklessly > write-protected. It does not hurt anything since it is a rare event. And the > optimization can be back if we check SPTE_MMU_WRITEABLE instead. > > 5) fast_page_fault() > 3110 /* > 3111 * Check if it is a spurious fault caused by TLB lazily flushed. > 3112 * > 3113 * Need not check the access of upper level table entries since > 3114 * they are always ACC_ALL. > 3115 */ > 3116 if (is_writable_pte(spte)) { > 3117 ret = true; > 3118 goto exit; > 3119 } > > Since kvm_vm_ioctl_get_dirty_log() firstly get-and-clear dirty-bitmap before > do write-protect, the dirty-bitmap will be properly set again when fast_page_fault > fix the spte who is write-protected by lockless write-protection. > > 6) in fast_page_fault's tracepoint: > 244 #define __spte_satisfied(__spte) \ > 245 (__entry->retry && is_writable_pte(__entry->__spte)) > It causes the tracepoint reports the wrong result when fast_page_fault > and tdp_page_fault/lockless-write-protect run concurrently, i guess it's > okay since it's only used for trace. > > 7) in audit_write_protection(): > 202 if (is_writable_pte(*sptep)) > 203 audit_printk(kvm, "shadow page has writable " > 204 "mappings: gfn %llx role %x\n", > 205 sp->gfn, sp->role.word); > It's okay since lockless-write-protection does not update the readonly sptes. > > > > > OK to improve comments later. > > Thank you, Marcelo! > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html