On Tue 11-07-17 12:32:57, Ram Pai wrote: > On Tue, Jul 11, 2017 at 04:52:46PM +0200, Michal Hocko wrote: > > On Wed 05-07-17 14:21:37, Ram Pai wrote: > > > Memory protection keys enable applications to protect its > > > address space from inadvertent access or corruption from > > > itself. > > > > > > The overall idea: > > > > > > A process allocates a key and associates it with > > > an address range within its address space. > > > The process then can dynamically set read/write > > > permissions on the key without involving the > > > kernel. Any code that violates the permissions > > > of the address space; as defined by its associated > > > key, will receive a segmentation fault. > > > > > > This patch series enables the feature on PPC64 HPTE > > > platform. > > > > > > ISA3.0 section 5.7.13 describes the detailed specifications. > > > > Could you describe the highlevel design of this feature in the cover > > letter. > > Yes it can be hard to understand without the big picture. I will > provide the high level design and the rationale behind the patch split > towards the end. Also I will have it in the cover letter for my next > revision of the patchset. Thanks! > > I have tried to get some idea from the patchset but it was > > really far from trivial. Patches are not very well split up (many > > helpers are added without their users etc..). > > I see your point. Earlier, I had the patches split such a way that the > users of the helpers were in the same patch as that of the helper. > But then comments from others lead to the current split. It is not my call here, obviously. I cannot review arch specific parts due to lack of familiarity but it is a general good practice to include helpers along with their users to make the usage clear. Also, as much as I like small patches because they are easier to review, having very many of them can lead to a harder review in the end because you easily lose a higher level overview. > > > Testing: > > > This patch series has passed all the protection key > > > tests available in the selftests directory. > > > The tests are updated to work on both x86 and powerpc. > > > > > > version v5: > > > (1) reverted back to the old design -- store the > > > key in the pte, instead of bypassing it. > > > The v4 design slowed down the hash page path. > > > > This surprised me a lot but I couldn't find the respective code. Why do > > you need to store anything in the pte? My understanding of PKEYs is that > > the setup and teardown should be very cheap and so no page tables have > > to updated. Or do I just misunderstand what you wrote here? > > Ideally the MMU looks at the PTE for keys, in order to enforce > protection. This is the case with x86 and is the case with power9 Radix > page table. Hence the keys have to be programmed into the PTE. But x86 doesn't update ptes for PKEYs, that would be just too expensive. You could use standard mprotect to do the same... > However with HPT on power, these keys do not necessarily have to be > programmed into the PTE. We could bypass the Linux Page Table Entry(PTE) > and instead just program them into the Hash Page Table(HPTE), since > the MMU does not refer the PTE but refers the HPTE. The last version > of the page attempted to do that. It worked as follows: > > a) when a address range is requested to be associated with a key; by the > application through key_mprotect() system call, the kernel > stores that key in the vmas corresponding to that address > range. > > b) Whenever there is a hash page fault for that address, the fault > handler reads the key from the VMA and programs the key into the > HPTE. __hash_page() is the function that does that. What causes the fault here? > c) Once the hpte is programmed, the MMU can sense key violations and > generate key-faults. > > The problem is with step (b). This step is really a very critical > path which is performance sensitive. We dont want to add any delays. > However if we want to access the key from the vma, we will have to > hold the vma semaphore, and that is a big NO-NO. As a result, this > design had to be dropped. > > > > I reverted back to the old design i.e the design in v4 version. In this > version we do the following: > > a) when a address range is requested to be associated with a key; by the > application through key_mprotect() system call, the kernel > stores that key in the vmas corresponding to that address > range. Also the kernel programs the key into Linux PTE coresponding to all the > pages associated with the address range. OK, so how is this any different from the regular mprotect then? > b) Whenever there is a hash page fault for that address, the fault > handler reads the key from the Linux PTE and programs the key into > the HPTE. > > c) Once the HPTE is programmed, the MMU can sense key violations and > generate key-faults. > > > Since step (b) in this case has easy access to the Linux PTE, and hence > to the key, it is fast to access it and program the HPTE. Thus we avoid > taking any performance hit on this critical path. > > Hope this explains the rationale, > > > As promised here is the high level design: I will read through that later [...] -- Michal Hocko SUSE Labs