It's certainly not read-only in general. Just having a quick look to verify, the
very first callback I landed on was clear_refs_pte_range(), which implements
.pmd_entry to clear the softdirty and access flags from a leaf pmd or from all
the child ptes.
Yup sorry I misspoke, working some long hours atm so forgive me :) what I meant
to say is that we either read or modify existing.
And yes users do do potentially crazy things and yada yada.
David and I have spoken quite a few times about implementing generic page
table code that could help abstract a lot of things, and it feels like this
logic could all be rejigged in some fashion as to prevent the kind of
'everybody does their own handler' logic.q
Oscar is currently prototyping a new pt walker that will batch entries
(e.g., folio ranges, pte none ares), and not use indirect calls. The
primary focus will will read-traversal, but nothing speaks against
modifications (likely helpers for installing pte_none()->marker could be
handy, and just creating page tables if we hit pmd_none() etc.).
Not sure yet how many use cases we can cover with the initial approach.
But the idea is to start with something that works for many cases, to
then gradually improve it.
I guess I felt it was more _dangerous_ as you are establishing _new_
mappings here, with the page tables being constructed for you up to the PTE
level.
And wanted to 'lock things down' somewhat.
But indeed, all this cries out for a need for a more generalised, robust
interface that handles some of what the downstream users of this are doing.
When setting things are a little different, I'd rather not open up things to a
user being able to do *whatever*, but rather limit to the smallest scope
possible for installing the PTE.
Understandable, but personally I think it will lead to potential misunderstandings:
- it will get copy/pasted as an example of how to set a pte (which is wrong;
you have to use set_pte_at()/set_ptes()). There is currently only a single other
case of direct dereferencing a pte to set it (in write_protect_page()).
Yeah, at least renaming the param could help, as 'ptep' implies you really
do have a pointer to the page table entry.
If we didn't return an error we could just return the PTE value or
something... hm.
- new users of .install_pte may assume (like I did) that the passed in ptep is
pointing to the pgtable and they will manipulate it with arch helpers. arm64
arch helpers all assume they are only ever passed pointers into pgtable memory.
It will end horribly if that is not the case.
It will end very horribly indeed :P or perhaps with more of a fizzle than
anticipated...
Yes, I'm hoping we can avoid new users with the old api ... :)
And also of course, it allows us to _mandate_ that set_pte_at() is used so we do
the right thing re: arches :)
I could have named the parameter better though, in guard_install_pte_entry()
would be better to have called it 'new_pte' or something.
I'd suggest at least describing this in the documentation in pagewalk.h. Or
better yet, you could make the pte the return value for the function. Then it is
clear because you have no pointer. You'd lose the error code but the only user
of this currently can't fail anyway.
Haha and here you make the same point I did above... great minds :)
I mean yeah returning a pte would make it clearer what you're doing, but
then it makes it different from every other callback... but this already is
different :)
I do very much want the ability to return an error value to stop the walk
(if you return >0 you can indicate to caller that a non-error stop occurred
for instance, something I use on the reading side).
But we do need to improve this one way or another, at the very least the
documentation/comments.
David - any thoughts?
Maybe document "don't use this until we have something better" :D
I'm not necessarily against just making this consitent, but I like this
property of us controlling what happens instead of just giving a pointer
into the page table - the principle of exposing the least possible.
ANWYAY, I will add to my ever expanding whiteboard TODO list [literally the
only todo that work for me] to look at this, will definitely improve docs
at very least.
Maybe we can talk at LSF/MM about this. I assume Oscar might be
partially covering that in his hugetlb talk.
--
Cheers,
David / dhildenb