Re: [PATCH] mm: Add new vma flag VM_LOCAL_CPU

Boaz Harrosh <boazh@xxxxxxxxxx> · Tue, 15 May 2018 14:41:41 +0300

On 15/05/18 14:11, Matthew Wilcox wrote:
> On Tue, May 15, 2018 at 01:43:23PM +0300, Boaz Harrosh wrote:
>> On 15/05/18 03:41, Matthew Wilcox wrote:
>>> On Mon, May 14, 2018 at 10:37:38PM +0300, Boaz Harrosh wrote:
>>>> On 14/05/18 22:15, Matthew Wilcox wrote:
>>>>> On Mon, May 14, 2018 at 08:28:01PM +0300, Boaz Harrosh wrote:
>>>>>> On a call to mmap an mmap provider (like an FS) can put
>>>>>> this flag on vma->vm_flags.
>>>>>>
>>>>>> The VM_LOCAL_CPU flag tells the Kernel that the vma will be used
>>>>>> from a single-core only, and therefore invalidation (flush_tlb) of
>>>>>> PTE(s) need not be a wide CPU scheduling.
>>>>>
>>>>> I still don't get this.  You're opening the kernel up to being exploited
>>>>> by any application which can persuade it to set this flag on a VMA.
>>>>>
>>>>
>>>> No No this is not an application accessible flag this can only be set
>>>> by the mmap implementor at ->mmap() time (Say same as VM_VM_MIXEDMAP).
>>>>
>>>> Please see the zuf patches for usage (Again apologise for pushing before
>>>> a user)
>>>>
>>>> The mmap provider has all the facilities to know that this can not be
>>>> abused, not even by a trusted Server.
>>>
>>> I don't think page tables work the way you think they work.
>>>
>>> +               err = vm_insert_pfn_prot(zt->vma, zt_addr, pfn, prot);
>>>
>>> That doesn't just insert it into the local CPU's page table.  Any CPU
>>> which directly accesses or even prefetches that address will also get
>>> the translation into its cache.
>>
>> Yes I know, but that is exactly the point of this flag. I know that this
>> address is only ever accessed from a single core. Because it is an mmap (vma)
>> of an O_TMPFILE-exclusive file created in a core-pinned thread and I allow
>> only that thread any kind of access to this vma. Both the filehandle and the
>> mmaped pointer are kept on the thread stack and have no access from outside.
>>
>> So the all point of this flag is the kernel driver telling mm that this
>> address is enforced to only be accessed from one core-pinned thread.
> 
> You're still thinking about this from the wrong perspective.  If you
> were writing a program to attack this facility, how would you do it?
> It's not exactly hard to leak one pointer's worth of information.
> 

That would be very hard. Because that program would:
- need to be root
- need to start and pretend it is zus Server with the all mount
  thread thing, register new filesystem, grab some pmem devices.
- Mount the said filesystem on said pmem. Create core-pinned ZT threads
  for all CPUs, start accepting IO.
- And only then it can start leaking the pointer and do bad things.
  The bad things it can do to the application, not to the Kernel.
  And as a full filesystem it can do those bad things to the application
  through the front door directly not needing the mismatch tlb at all.

That said. It brings up a very important point that I wanted to talk about.
In this design the zuf(Kernel) and the zus(um Server) are part of the distribution.
I would like to have the zus module be signed by the distro's Kernel's key and
checked on loadtime. I know there is an effort by Redhat guys to try and sign all
/sbin/* servers and have Kernel check these. So this is not the first time people
have thought about that.

Thanks
Boaz