On Thu, Jan 20, 2022, Andrew Cooper wrote: > On 20/01/2022 16:36, Sean Christopherson wrote: > > On Thu, Jan 20, 2022, Andrew Cooper wrote: > >> On 20/01/2022 00:06, Sean Christopherson wrote: > >>> MOVSS blocking can be initiated by userspace, but can be coincident with > >>> a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is > >>> executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking > >>> is problematic only for CPL0 (and only if the guest is crazy enough to > >>> access a DR in a MOVSS shadow). All other sources of #DBs are either > >>> suppressed by MOVSS blocking (single-step, code fetch, data, and I/O), > >> It is more complicated than this and undocumented. Single step is > >> discard in a shadow, while data breakpoints are deferred. > > But for the purposes of making the consitency check happy, whether they are > > deferred or dropped should be irrelevant, no? > > From that point of view, yes. The consistency check is specific to TS. > I suppose I was mostly questioning the wording of the explanation. > > >>> are mutually exclusive with MOVSS blocking (T-bit task switch), > >> Howso? MovSS prevents external interrupts from triggering task > >> switches, but instruction sources still trigger in a shadow. > > T-bit #DBs are traps, and arrive after the task switch has completed. The switch > > can be initiated in the shadow, but the #DB will be delivered after the instruction > > retires and so after MOVSS blocking goes away. Or am I missing something? > > Well - this is where the pipeline RTL is needed, in lieu of anything > better. Trap-style #DBs are part of the current instruction, and > specifically ahead (in the instruction cycle) of the subsequent intchk. And T-bit traps in particular have crazy high priority... > There are implementations where NMI/INTR/etc won't be delivered at the > head of an exception generated in a shadow, which would suggest that > these implementations have the falling edge of the shadow after intchk > on the instruction boundary. (Probably certainly what happens is that > intchk is responsible for clearing the shadow, but this is entirely > guesswork on my behalf.) Well, thankfully hardware's behavior should be moot for VM-Entry since task switches unconditionally VM-Exit, and KVM has a big fat TODO for handling the T-bit. > >> and splitlock which is new since I last thought about this problem. > > Eww. Split Lock is trap-like, which begs the question of what happens if the > > MOV/POP SS splits a cache line when loading the source data. I'm guess it's > > suppressed, a la data breakpoints, but that'd be a fun one to test. > > They're both reads of their memory operand, so aren't eligible to be > locked accesses. Hah, right, the "lock" part of "split lock" is just a minor detail... > However, a devious kernel can misalign the GDT/LDT such that setting the > descriptor access bit does trigger a splitlock. I suppose "kernel > doesn't misalign structures", or "kernel doesn't write a descriptor with > the access bit clear" are both valid mitigations. > > >>> This bug was originally found by running tests[1] created for XSA-308[2]. > >>> Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is > >>> presumably why the Xen bug was deemed to be an exploitable DOS from guest > >>> userspace. > >> As I recall, the original report to the security team was something > >> along the lines of "Steam has just updated game, and now when I start > >> it, the VM explodes". > > Lovely. I wonder if the game added some form of anti-cheat? I don't suppose you > > have disassembly from the report? I'm super curious what on earth a game would > > do to trigger this. > > Anti-cheat was my guess too, but no disassembly happened. > > I was already aware of the STI issue, and had posted > https://lore.kernel.org/xen-devel/1528120755-17455-11-git-send-email-andrew.cooper3@xxxxxxxxxx/ > more than a year previously. The security report showed ICEBP pending > in the INTR_INFO field, and extending the STI test case in light of this > was all of 30s of work to get a working repro. > > ~Andrew