Re: [RFC v2 00/27] Kernel Address Space Isolation

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Fri, 12 Jul 2019 10:45:06 -0600

> On Jul 12, 2019, at 10:37 AM, Alexandre Chartre <alexandre.chartre@xxxxxxxxxx> wrote:
> 
> 
> 
>> On 7/12/19 5:16 PM, Thomas Gleixner wrote:
>>> On Fri, 12 Jul 2019, Peter Zijlstra wrote:
>>>> On Fri, Jul 12, 2019 at 01:56:44PM +0200, Alexandre Chartre wrote:
>>>> 
>>>> I think that's precisely what makes ASI and PTI different and independent.
>>>> PTI is just about switching between userland and kernel page-tables, while
>>>> ASI is about switching page-table inside the kernel. You can have ASI without
>>>> having PTI. You can also use ASI for kernel threads so for code that won't
>>>> be triggered from userland and so which won't involve PTI.
>>> 
>>> PTI is not mapping         kernel space to avoid             speculation crap (meltdown).
>>> ASI is not mapping part of kernel space to avoid (different) speculation crap (MDS).
>>> 
>>> See how very similar they are?
>>> 
>>> Furthermore, to recover SMT for userspace (under MDS) we not only need
>>> core-scheduling but core-scheduling per address space. And ASI was
>>> specifically designed to help mitigate the trainwreck just described.
>>> 
>>> By explicitly exposing (hopefully harmless) part of the kernel to MDS,
>>> we reduce the part that needs core-scheduling and thus reduce the rate
>>> the SMT siblngs need to sync up/schedule.
>>> 
>>> But looking at it that way, it makes no sense to retain 3 address
>>> spaces, namely:
>>> 
>>>   user / kernel exposed / kernel private.
>>> 
>>> Specifically, it makes no sense to expose part of the kernel through MDS
>>> but not through Meltdow. Therefore we can merge the user and kernel
>>> exposed address spaces.
>>> 
>>> And then we've fully replaced PTI.
>>> 
>>> So no, they're not orthogonal.
>> Right. If we decide to expose more parts of the kernel mappings then that's
>> just adding more stuff to the existing user (PTI) map mechanics.
> 
> If we expose more parts of the kernel mapping by adding them to the existing
> user (PTI) map, then we only control the mapping of kernel sensitive data but
> we don't control user mapping (with ASI, we exclude all user mappings).
> 
> How would you control the mapping of userland sensitive data and exclude them
> from the user map?

As I see it, if we think part of the kernel is okay to leak to VM guests, then it should think it’s okay to leak to userspace and versa. At the end of the day, this may just have to come down to an administrator’s choice of how careful the mitigations need to be.

> Would you have the application explicitly identify sensitive
> data (like Andy suggested with a /dev/xpfo device)?

That’s not really the intent of my suggestion. I was suggesting that maybe we don’t need ASI at all if we allow VMs to exclude their memory from the kernel mapping entirely.  Heck, in a setup like this, we can maybe even get away with turning PTI off under very, very controlled circumstances.  I’m not quite sure what to do about the kernel random pools, though.