On Fri, 19 Apr 2024, Zack Weinberg wrote: > On Fri, Apr 19, 2024, at 4:15 PM, Mikulas Patocka wrote: > > On Fri, 19 Apr 2024, Zack Weinberg wrote: > >> ... the copy > >> of round_keys in the vector registers *won't* get erased -- the exact > >> problem being discussed in this thread. > > > > On the SYSV ABI, all the vector registers are volatile, so you can erase > > them in explicit_bzero. > > > > On Windows 64-bit ABI, it is more problematic, because some of the vector > > registers must be preserved. > > Oh, huh. Yes, that would work. I've just realized that this wouldn't work - if the function explicit_bzero is lazily resolved, the dynamic linker would spill the vector registers to the stack prior to calling explicit_bzero. > Call-preserved registers are not a > problem, because any function that puts secret data in a call-preserved > register in the first place, must erase it again (by restoring the old > value) before returning. Therefore, if we made explicit_bzero wipe *all* > the call-clobbered registers before returning, my example function would > be safe. > > There's still a place secrets could leak to and not get erased, though: > register spill slots on the stack. Only the compiler could plug this > leak. Long term, I think what we want is something like > __attribute__((sensitive)), which can only be applied to variables with > automatic storage duration, and which means "erase all copies of this > variable's value, wherever they wound up, at the end of its lifetime." > Note that such variables must not be put in call-preserved registers in > non-leaf functions, because then they might get spilled to the stack by > a callee, which has no way of knowing that it's just leaked a secret. > And I suppose we might also want to worry about signal frames. Nobody > said this was gonna be easy ;-) > > zw Yes. Another problem is varargs - if there is at least one floating point argument, the compiler will store 8 XMM registers on the stack regardless of whether they are used or not. In the past it didn't do it (it made indirect jump based on the value in the %AL register to save only the used registers), but someone probably found out that indirect jumps are expensive and that storing all 8 registers is faster. Mikulas