Re: [RFC PATCH 3/3] mm/migrate: Create move_phys_pages syscall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Sep 09, 2023 at 05:18:13PM +0200, Arnd Bergmann wrote:
> 
> I think a pointer to '__u64' is the most appropriate here,
> that is compatible between 32-bit and 64-bit architectures
> and covers all addresses until we get architectures with
> 128-bit addressing.
> 
> Thinking about it more, I noticed an existing bug in
> both sys_move_pages and your current version of
> sys_move_phys_pages: the 'pages' array is in fact not
> suitable for compat tasks and needs an is_compat_task
> check to load a 32-bit address from compat userspace on
> the "if (get_user(p, pages + i))" line.
>

I'll clean up the current implementation for what I have on a v2 of an
RFC, and then look at adding some pull-ahead patches to fix both
move_pages and move_phys_pages for compat processes.  Might take me a
bit, I've only done compat work once before and I remember it being
annoying to get right.

I did see other work on migrate.c hanging around on the list, I'll
double check this hasn't already been discovered/handled.

> > on address vs pfn:
> >
> > Would using PFNs cause issues with portability of userland code? User
> > code that gets access to a physical address would have to convert
> > that to a PFN, which would be kernel-defined.  That could be easy
> > enough if the kernel exposed the shift size somewhere.
> 
> I don't think we currently use PFN anywhere in the syscall
> ABI, so this introduces a new basic type into uapi headers that
> is currently kernel internal.
> 
> A 32-bit PFN is always sufficient to represent all physical
> addresses on native 32-bit kernel, but not necessarily for
> compat user space on 64-bit kernels with an address space wider
> than 44 bits (16 terabyte address range).
> 
> For the interface it would also require the same quirk
> for compat tasks that I pointed out above that is missing
> in sys_move_pages today.
> 
> A minor quirk of PFN values is that they require knowing
> the page size to convert addresses, but I suppose you need
> that anyway.
>

Hm.  So, based on the 5 sources I listed below, some can be PFN and
others can be phys-addr.  To validate a physical address and
ensure the page is online, we have to convert to PFN anyway.

So for the user's sake, lets consider

Source Data: PFN
User Actions:
    1) If interface requires phys-addr, convert to phys-addr
    2) If interface allows PFN, simply pass in as-is
Kernel Actions:
    0) If Phys-Addr, convert to PFN
    1) Validate the PFN (no bits beyond architecture length)
    2) Get and validate struct page

Source Data: Phys-Addr
User Actions:
    1) If interface requires phys-addr, simply pass in as-is
    2) If interface requries PFN, convert to PFN
Kernel Actions:
    0) If Phys-addr, convert to PFN
    1) Validate the PFN (no bits beyond arch length)
    2) Get and validate struct page

I think the take-away here is that the kernel has to validate the PFN
regardless, and the kernel already has the information to do that.

If you want to use both PFN (Idle Bit) and Phys Addr (IBS/PEBS)
information, then the user needs the information to do page shifts
That requires more exposure of kernel information, and is possibly
complicated by things like hugepages.  It seems better to simply
allow the interface to take both values, and add flags which
format is being used.

The worst possible result of malformed input is that the wrong page
is migrated.  That's no different than move_phys_pages(/dev/random),
and the user already requires CAP_SYS_NICE or _ADMIN, so there are
probably worse things they could accomplish.

Additionally, this allows the page buffer to simply use __u64*,
and no need to expose a new user type for PFN.

Now the interactions look like:

Source Data: Phys-Addr or PFN
User Actions:
    0) move_phys_pages([phys-addr | pfn], MPOL_MF_(PHYS_ADDR ^ PFN))
Kernel Actions:
    0) If Phys-addr, convert to PFN
    1) Validate the PFN (no bits beyond arch length)
    2) Get and validate struct page

This only requires plumbing new 2 flags through do_pages_move, and no
new user-exposed types or information.

Is there an ick-factor with the idea of adding the following?

MPOL_MF_PHYS_ADDR : Treat page migration addresses as physical
MPOL_MF_PFN : Treat page migration addresses as PFNs

> 
> These do not seem to be problematic from the ASLR perspective, so
> I guess it may still be useful without CAP_SYS_ADMIN.
> 
>      Arnd

After reviewing the capabilities documentation it seems like
CAP_SYS_NICE is the appropriate capability.  My last meassage I said
CAP_SYS_ADMIN was probably correct, but I think using CAP_SYS_NICE
is more appropriate unless there are strong feelings due to the use of
PFN and Physcall Address.

I'm not sure rowhammer is of great concern in this interface because you
can't choose the destination address, only the destination node. Though
I suppose someone could go nuts and try to "massage" a node in some way
to get a statistical likelihood of placement (similar heap grooming).

~Gregory



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux