Re: [PATCH 10/17] prmem: documentation

Igor Stoppa <igor.stoppa@xxxxxxxxx> · Tue, 4 Dec 2018 14:34:31 +0200

Hello,
apologies for the delayed answer.
Please find my reply to both last mails in the thread, below.

On 22/11/2018 22:53, Andy Lutomirski wrote:
On Thu, Nov 22, 2018 at 12:04 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:

On Thu, Nov 22, 2018 at 09:27:02PM +0200, Igor Stoppa wrote:
I have studied the code involved with Nadav's patchset.
I am perplexed about these sentences you wrote.

More to the point (to the best of my understanding):

poking_init()
-------------
   1. it gets one random poking address and ensures to have at least 2
      consecutive PTEs from the same PMD
   2. it then proceeds to map/unmap an address from the first of the 2
      consecutive PTEs, so that, later on, there will be no need to
      allocate pages, which might fail, if poking from atomic context.
   3. at this point, the page tables are populated, for the address that
      was obtained at point 1, and this is ok, because the address is fixed

write_rare
----------
   4. it can happen on any available core / thread at any time, therefore
      each of them needs a different address

No?  Each CPU has its own CR3 (eg each CPU might be running a different
user task).  If you have _one_ address for each allocation, it may or
may not be mapped on other CPUs at the same time -- you simply don't care.

Yes, somehow I lost track of that aspect.

The writable address can even be a simple formula to calculate from
the read-only address, you don't have to allocate an address in the
writable mapping space.

Agreed.  I suggest the formula:

writable_address = readable_address - rare_write_offset.  For
starters, rare_write_offset can just be a constant.  If we want to get
fancy later on, it can be randomized.

ok, I hope I captured it here [1]

If we do it like this, then we don't need to modify any pagetables at
all when we do a rare write.  Instead we can set up the mapping at
boot or when we allocate the rare write space, and the actual rare
write code can just switch mms and do the write.

I did it. I have little feeling about the actual amount of data 
involved, but there is a (probably very remote) chance that the remap 
wouldn't work, at least in the current implementation.

It's a bit different from what I had in mind initially, since I was 
thinking to have one single approach to both statically allocated memory 
(is there a better way to describe it) and what is provided from the 
allocator that would come next.

As I wrote, I do not particularly like the way I implemented multiple 
functionality vs remapping, but I couldn't figure out any better way to 
do it, so eventually I kept this one, hoping to get some advice on how 
to improve it.

I did not provide yet an example, yet, but IMA has some flags that are 
probably very suitable, since they depend on policy reloading, which can 
happen multiple times, but could be used to disable it.

[1] https://www.openwall.com/lists/kernel-hardening/2018/12/04/3

--
igor