Ooops sorry, I was fwding thread HW folks to correct my understanding. I will update you with result. thanks On 7/16/19 1:05 PM, Singh, Brijesh wrote: > Here is a thread.. but more recent is available > > https://marc.info/?t=156322283300001&r=1&w=2 > > Paolo, Sean and others have also replied to it which you can see on > marc.info. > > -Brijesh > > On 7/16/19 11:20 AM, Liran Alon wrote: >> >> >>> On 16 Jul 2019, at 19:10, Singh, Brijesh <brijesh.singh@xxxxxxx> wrote: >>> >>> >>> >>> On 7/16/19 10:56 AM, Liran Alon wrote: >>>> >>>> >>>>> On 16 Jul 2019, at 18:48, Singh, Brijesh <brijesh.singh@xxxxxxx> wrote: >>>>> >>>>> On 7/15/19 3:30 PM, Liran Alon wrote: >>>>>> According to AMD Errata 1096: >>>>>> "On a nested data page fault when CR4.SMAP = 1 and the guest data read generates a SMAP violation, the >>>>>> GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction >>>>>> bytes." >>>>>> >>>>>> As stated above, errata is encountered when guest read generates a SMAP violation. i.e. vCPU runs >>>>>> with CPL<3 and CR4.SMAP=1. However, code have mistakenly checked if CPL==3 and CR4.SMAP==0. >>>>>> >>>>> >>>>> The SMAP violation will occur from CPL3 so CPL==3 is a valid check. >>>>> >>>>> See [1] for complete discussion >>>>> >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10808075_-2322479271&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=RAt8t8nBaCxUPy5OTDkO0n8BMQ5l9oSfLMiL0TLTu6c&s=Nkwe8rTJhygBCIPz27LXrylptjnWyMwB-nJaiowWpWc&e= >>>> >>>> I still don’t understand. SMAP is a mechanism which is meant to protect a CPU running in CPL<3 from mistakenly referencing data controllable by CPL==3. >>>> Therefore, SMAP violation should be raised when CPL<3 and data referenced is mapped in page-tables with PTE with U/S bit set to 1. (i.e. User accessible). >>>> >>>> Thus, we should check if CPL<3 and CR4.SMAP==1. >>>> >>> >>> In this particular case we are dealing with NPF and not SMAP fault per >>> say. >>> >>> What typically has happened here is: >>> >>> - user space does the MMIO access which causes a fault >>> - hardware processes this as a VMEXIT >>> - during processing, hardware attempts to read the instruction bytes to >>> provide decode assist. This is typically done by data read request from >>> the RIP that the guest was at. While doing so, we may hit SMAP fault >> >> How can a SMAP fault occur when CPL==3? One of the conditions for SMAP is that CPL<3. >> >> I think the confusion is that I believe a code mapped as user-accessible in page-tables but runs with CPL<3 >> should be the one which does the MMIO. Rather then code running in CPL==3. >> >> The sequence of events I imagine to trigger the Errata is as follows: >> 1) Guest maps code in page-tables as user-accessible (i.e. PTE with U/S bit set to 1). >> 2) Guest executes this code with CPL<3 (even though mapped as user-accessible which is a security vulnerability in itself…) which access data that is not mapped or marked as reserved in NPT and therefore cause #NPF. >> 3) Physical CPU DecodeAssist feature attempts to fill-in guest instruction bytes. So it reads as data the guest instructions while CPU is currently with CPL<3, CR4.SMAP=1 and code is mapped as user-accessible. Therefore, this fill-in raise a SMAP violation which cause #NPF to be raised to KVM with 0 instruction bytes. >> >> BTW, this also means that in order to trigger this, CR4.SMEP should be set to 0. As otherwise, instruction couldn’t have been executed to raise #NPF in the first place. Maybe we can add this as another condition to recognise the Errata? >> >> -Liran >> >>> because internally CPU is doing a data read from the RIP to get those >>> instruction bytes. Since it hit the SMAP fault hence it was not able >>> to decode the instruction to provide the insn_len. So we are first >>> checking if it was a fault caused from CPL==3 and SMAP is enabled. >>> If so, we are hitting this errata and it can be workaround. >>> >>> -Brijesh >>> >>> >>> >>