Re: [PATCH RFC 01/10] vmalloc: Add basic perm alloc implementation

"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> · Mon, 23 Nov 2020 20:44:12 +0000

On Mon, 2020-11-23 at 09:00 +0000, Christoph Hellwig wrote:
> First thanks for doing this, having a vmalloc variant that starts out
> with proper permissions has been on my todo list for a while.
> 
> > +#define PERM_R	1
> > +#define PERM_W	2
> > +#define PERM_X	4
> > +#define PERM_RWX	(PERM_R | PERM_W | PERM_X)
> > +#define PERM_RW		(PERM_R | PERM_W)
> > +#define PERM_RX		(PERM_R | PERM_X)
> 
> Why can't this use the normal pgprot flags?
> 

Well, there were two reasons:
1. Non-standard naming for the PAGE_FOO flags. For example,
PAGE_KERNEL_ROX vs PAGE_KERNEL_READ_EXEC. This could be unified. I
think it's just riscv that breaks the conventions. Others are just
missing some.

2. The need to translate between the flags and set_memory_foo() calls.
For example if a permission is RW and the caller is asking to change it
to RWX. Some architectures have an X permission and others an NX
permission, and it's the same with read only vs writable. So these
flags are trying to be more analogous of the cross-arch set_memory_()
function names rather than pgprot flags.

I guess you could do something like (pgprot_val(PAGE_KERNEL_EXEC) &
~pgprot_val(PAGE_KERNEL)) and assume if there are any bits set it is a
positive permission and from that deduce whether to call set_memory_nx() or set_memory_x().

But I thought that using those pgprot flags was still sort overloading
the meaning of pgprot. My understanding was that it is supposed to hold
the actual bits set in the PTE. For example large pages or TLB hints
(like PAGE_KERNEL_EXEC_CONT) could set or unset extra bits, so asking
for PAGE_KERNEL_EXEC wouldn't necessarily mean "set these bits in all
of the PTEs", it could mean something more like "infer what I want from
these bits and do that".

x86's cpa will also avoid changing NX if it is not supported, so if the
caller asked for PAGE_KERNEL->PAGE_KERNEL_EXEC in perm_change() it
should not necessarily bother setting all of the PAGE_KERNEL_EXEC bits
in the actual PTEs. Asking for PERM_RW->PERM_RWX on the other hand,
would let the implementation do whatever it needs to set the memory
executable, like set_memory_x() does. It should work either way but
seems like the expectations would be a little clearer with the PERM_
flags.

On the other hand, creating a whole new set of flags is not ideal
either. But that was just my reasoning. Does it seem worth it?

> > +typedef u8 virtual_perm;
> 
> This would need __bitwise annotations to allow sparse to typecheck
> the
> flags.
> 

Ok, thanks.

> > +/*
> > + * Allocate a special permission kva region. The region may not be
> > mapped
> > + * until a call to perm_writable_finish(). A writable region will
> > be mapped
> > + * immediately at the address returned by perm_writable_addr().
> > The allocation
> > + * will be made between the start and end virtual addresses.
> > + */
> > +struct perm_allocation *perm_alloc(unsigned long vstart, unsigned
> > long vend, unsigned long page_cnt,
> > +				   virtual_perm perms);
> 
> Please avoid totally pointless overly long line (all over the series)

Could easily wrap this one, but just to clarify, do you mean lines over
80 chars? There were already some over 80 in vmalloc before the move to
100 chars, so figured it was ok to stretch out now.

> Also I find the unsigned long for kernel virtual address interface
> strange, but I'll take a look at the callers later.

Yea, some of the callers need to cast either way. I think I changed it
to unsigned long, because casting (void *) was smaller in the code than
(unsigned long) and it shorted some line lengths.