Il 28/03/2014 18:36, Feng Wu ha scritto: > + smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP); You are overwriting this variable below, but that is not okay because the value of CR4 must be considered separately in each iteration. This also hides a uninitialized-variable bug for "smap" correctly in the EPT case. To avoid that, rename this variable to cr4_smap; it's probably better to rename smep to cr4_smep too. > for (byte = 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) { > pfec = byte << 1; > map = 0; > wf = pfec & PFERR_WRITE_MASK; > uf = pfec & PFERR_USER_MASK; > ff = pfec & PFERR_FETCH_MASK; > + smapf = pfec & PFERR_RSVD_MASK; The reader will expect PFERR_RSVD_MASK to be zero here. So please add a comment: /* PFERR_RSVD_MASK is set in pfec if ... */". > for (bit = 0; bit < 8; ++bit) { > x = bit & ACC_EXEC_MASK; > w = bit & ACC_WRITE_MASK; > @@ -3627,11 +3629,27 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu, > w |= !is_write_protection(vcpu) && !uf; > /* Disallow supervisor fetches of user code if cr4.smep */ > x &= !(smep && u && !uf); > + > + /* > + * SMAP:kernel-mode data accesses from user-mode > + * mappings should fault. A fault is considered > + * as a SMAP violation if all of the following > + * conditions are ture: > + * - X86_CR4_SMAP is set in CR4 > + * - An user page is accessed > + * - Page fault in kernel mode > + * - !(CPL<3 && X86_EFLAGS_AC is set) > + * > + * Here, we cover the first three conditions, > + * we need to check CPL and X86_EFLAGS_AC in > + * permission_fault() dynamiccally "dynamically". These three lines however are not entirely correct. We do cover the last condition here, it is in smapf. So perhaps something like * Here, we cover the first three conditions. * The CPL and X86_EFLAGS_AC is in smapf, which * permission_fault() computes dynamically. > + */ > + smap = smap && smapf && u && !uf; SMAP does not affect instruction fetches. Do you need "&& !ff" here? Perhaps it's clearer to add it even if it is not strictly necessary. Please write just "smap = cr4_smap && u && !uf && !ff" here, and add back smapf below in the assignment to "fault". This makes the code more homogeneous. > } else > /* Not really needed: no U/S accesses on ept */ > u = 1; > - fault = (ff && !x) || (uf && !u) || (wf && !w); > + fault = (ff && !x) || (uf && !u) || (wf && !w) || smap; ... > + > + /* > + * If CPL < 3, SMAP protections are disabled if EFLAGS.AC = 1. > + * > + * If CPL = 3, SMAP applies to all supervisor-mode data accesses > + * (these are implicit supervisor accesses) regardless of the value > + * of EFLAGS.AC. > + * > + * So we need to check CPL and EFLAGS.AC to detect whether there is > + * a SMAP violation. > + */ > + > + smapf = ((mmu->permissions[(pfec|PFERR_RSVD_MASK) >> 1] >> pte_access) & > + 1) && !((cpl < 3) && ((rflags & X86_EFLAGS_AC) == 1)); > + > + return ((mmu->permissions[pfec >> 1] >> pte_access) & 1) || smapf; You do not need two separate accesses. Just add PFERR_RSVD_MASK to pfec if the conditions for SMAP are satisfied. There are two possibilities: 1) setting PFERR_RSVD_MASK if SMAP is being enforced, that is if CPL = 3 || AC = 0. This is what you are doing now. 2) setting PFERR_RSVD_MASK if SMAP is being overridden, that is if CPL < 3 && AC = 1. You then have to invert the bit in update_permission_bitmask. Please consider both choices, and pick the one that gives better code. Also, this must be written in a branchless way. Branchless tricks are common throughout the MMU code because the hit rate of most branches is pretty much 50%-50%. This is also true in this case, at least if SMAP is in use (if it is not in use, we'll have AC=0 most of the time). I don't want to spoil the fun, but I don't want to waste your time either so I rot13'ed my solution and placed it after the signature. ;) As to nested virtualization, I reread the code and it should already work, because it sets PFERR_USER_MASK. This means uf=1, and a SMAP fault will never trigger with uf=1. Thanks for following this! Please include "v3" in the patch subject on your next post! Paolo ------------------------------------- 8< -------------------------------------- Nqq qrsvavgvbaf sbe CSREE_*_OVG (0 sbe cerfrag, 1 sbe jevgr, rgp.) naq hfr gur sbyybjvat: vag vaqrk, fznc; /* * Vs PCY < 3, FZNC cebgrpgvbaf ner qvfnoyrq vs RSYNTF.NP = 1. * * Vs PCY = 3, FZNC nccyvrf gb nyy fhcreivfbe-zbqr qngn npprffrf * (gurfr ner vzcyvpvg fhcreivfbe npprffrf) ertneqyrff bs gur inyhr * bs RSYNTF.NP. * * Guvf pbzchgrf (pcy < 3) && (esyntf & K86_RSYNTF_NP), yrnivat * gur erfhyg va K86_RSYNTF_NP. Jr gura vafreg vg va cynpr * bs gur CSREE_EFIQ_ZNFX ovg; guvf ovg jvyy nyjnlf or mreb va csrp, * ohg vg jvyy or bar va vaqrk vs FZNC purpxf ner orvat bireevqqra. * Vg vf vzcbegnag gb xrrc guvf oenapuyrff. */ fznc = (pcy - 3) & (esyntf & K86_RSYNTF_NP); vaqrk = (csrp >> 1) + (fznc >> (K86_RSYNTF_NP_OVG - CSREE_EFIQ_OVG + 1)); erghea (zzh->crezvffvbaf[vaqrk] >> cgr_npprff) & 1; Gur qverpgvba bs CSREE_EFIQ_ZNFX vf gur bccbfvgr pbzcnerq gb lbhe pbqr. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html