On Tue, Jul 30, 2013 at 02:13:38PM +0200, Paolo Bonzini wrote: > Il 30/07/2013 13:56, Gleb Natapov ha scritto: > >>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > >>> index 4c4274d..b5273c3 100644 > >>> --- a/arch/x86/kvm/mmu.c > >>> +++ b/arch/x86/kvm/mmu.c > >>> @@ -3494,6 +3494,11 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gp > >>> return mmu->last_pte_bitmap & (1 << index); > >>> } > >>> > >>> +#define PTTYPE_EPT 18 /* arbitrary */ > >>> +#define PTTYPE PTTYPE_EPT > >>> +#include "paging_tmpl.h" > >>> +#undef PTTYPE > >>> + > >>> #define PTTYPE 64 > >>> #include "paging_tmpl.h" > >>> #undef PTTYPE > >>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > >>> index 7581395..e38b3c0 100644 > >>> --- a/arch/x86/kvm/paging_tmpl.h > >>> +++ b/arch/x86/kvm/paging_tmpl.h > >>> @@ -58,6 +58,21 @@ > >>> #define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT > >>> #define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT > >>> #define CMPXCHG cmpxchg > >>> +#elif PTTYPE == PTTYPE_EPT > >>> + #define pt_element_t u64 > >>> + #define guest_walker guest_walkerEPT > >>> + #define FNAME(name) ept_##name > >>> + #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK > >>> + #define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl) > >>> + #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl) > >>> + #define PT_INDEX(addr, level) PT64_INDEX(addr, level) > >>> + #define PT_LEVEL_BITS PT64_LEVEL_BITS > >>> + #define PT_GUEST_ACCESSED_MASK 0 > >>> + #define PT_GUEST_DIRTY_MASK 0 > >>> + #define PT_GUEST_DIRTY_SHIFT 0 > >>> + #define PT_GUEST_ACCESSED_SHIFT 0 > >>> + #define CMPXCHG cmpxchg64 > >>> + #define PT_MAX_FULL_LEVELS 4 > >>> #else > >>> #error Invalid PTTYPE value > >>> #endif > >> > >> Please add a > >> > >> BUILD_BUG_ON(!!PT_GUEST_ACCESSED_MASK != !!PT_GUEST_DIRTY_MASK); > >> #if PT_GUEST_ACCESSED_MASK > >> BUILD_BUG_ON(PT_GUEST_DIRTY_SHIFT <= PT_GUEST_ACCESSED_SHIFT); > >> BUILD_BUG_ON(PT_GUEST_DIRTY_SHIFT <= PT_WRITABLE_SHIFT); > >> BUILD_BUG_ON(PT_GUEST_DIRTY_MASK != (1 << PT_GUEST_DIRTY_SHIFT)); > >> BUILD_BUG_ON(PT_GUEST_ACCESSED_MASK != (1 << PT_GUEST_ACCESSED_SHIFT)); > > > > This will not work if I define _SHIFT to be 8/9 now. > > They will because I put them under an appropriate "#if". :) > True, missed that. > OTOH if you define _SHIFT to be 8/9 you can move the #if so that it only > covers the last two checks. > > > But we do not use > > BUILD_BUG_ON() to check values from the same define "namespace". It is > > implied that they are correct and when they change all "namespace" > > remains to be consistent. If you look at BUILD_BUG_ON() that we have > > (and this patch adds) they are from the form: > > PT_WRITABLE_MASK != ACC_WRITE_MASK > > ACC_WRITE_MASK != VMX_EPT_WRITABLE_MASK > > VMX_EPT_READABLE_MASK != PT_PRESENT_MASK > > VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK > > > > i.e they compare value from different "namespaces". > > Yes, all these BUILD_BUG_ONs make sense. > > But I think BUILD_BUG_ON() is more generically for consistency checks > and enforcing invariants that the code expects. Our invariants are: > > * A/D bits are enabled or disabled in pairs > > * dirty is the left of accessed and writable > > * masks should either be zero or match the corresponding shifts > > The alternative to BUILD_BUG_ON would be a comment that explains the > invariants, but there's no need to use English if C can do it better! :) > OK, will add this in separate patch. > >>> pt_element_t pte; > >>> pt_element_t __user *uninitialized_var(ptep_user); > >>> gfn_t table_gfn; > >>> @@ -322,7 +351,9 @@ retry_walk: > >>> accessed_dirty &= pte >> > >>> (PT_GUEST_DIRTY_SHIFT - PT_GUEST_ACCESSED_SHIFT); > >> > >> This shift is one of two things that bugged me. I dislike including > >> "wrong" code just because it is dead. Perhaps you can #define the > > Meanwhile I changed comment above this to be: > > /* > > * On a write fault, fold the dirty bit into accessed_dirty. > > * For modes without A/D bits support accessed_dirty will be > > * always clear. > > */ > > Good. > > >> shifts to 8 and 9 already now, even if the masks stay 0? > >> > > Currently I do not see any problem with that, but we will have to be careful > > that *_SHIFT values will not leak into a code where it could matter. > > They would leak with PT_GUEST_*_SHIFT == 0 too, I think? (And with > worse effects, because they would use bit 0 and/or do shifts with > negative counts). > > Perhaps PT_GUEST_*_SHIFT can instead be defined to a function like > __using_nonexistent_pte_bit(), so that compilation fails unless all such > references are optimized away. See arch/x86/include/asm/cmpxchg.h for > an example: > > /* > * Non-existant functions to indicate usage errors at link time > * (or compile-time if the compiler implements __compiletime_error(). > */ > extern void __xchg_wrong_size(void) > __compiletime_error("Bad argument size for xchg"); > Nice! But not all of them are optimized in the correct stage. The one in accessed_dirty calculation is reachable, but result is thrown away. If I define *_SHIFT to be a function the code cannot be optimized since compiler thinks that function call has side effects. Adding pure function attribute solves that though, so I'll go with that. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html