On Sat, Sep 09, 2023 at 05:18:13PM +0200, Arnd Bergmann wrote: > > I think a pointer to '__u64' is the most appropriate here, > that is compatible between 32-bit and 64-bit architectures > and covers all addresses until we get architectures with > 128-bit addressing. > > Thinking about it more, I noticed an existing bug in > both sys_move_pages and your current version of > sys_move_phys_pages: the 'pages' array is in fact not > suitable for compat tasks and needs an is_compat_task > check to load a 32-bit address from compat userspace on > the "if (get_user(p, pages + i))" line. > I'll clean up the current implementation for what I have on a v2 of an RFC, and then look at adding some pull-ahead patches to fix both move_pages and move_phys_pages for compat processes. Might take me a bit, I've only done compat work once before and I remember it being annoying to get right. I did see other work on migrate.c hanging around on the list, I'll double check this hasn't already been discovered/handled. > > on address vs pfn: > > > > Would using PFNs cause issues with portability of userland code? User > > code that gets access to a physical address would have to convert > > that to a PFN, which would be kernel-defined. That could be easy > > enough if the kernel exposed the shift size somewhere. > > I don't think we currently use PFN anywhere in the syscall > ABI, so this introduces a new basic type into uapi headers that > is currently kernel internal. > > A 32-bit PFN is always sufficient to represent all physical > addresses on native 32-bit kernel, but not necessarily for > compat user space on 64-bit kernels with an address space wider > than 44 bits (16 terabyte address range). > > For the interface it would also require the same quirk > for compat tasks that I pointed out above that is missing > in sys_move_pages today. > > A minor quirk of PFN values is that they require knowing > the page size to convert addresses, but I suppose you need > that anyway. > Hm. So, based on the 5 sources I listed below, some can be PFN and others can be phys-addr. To validate a physical address and ensure the page is online, we have to convert to PFN anyway. So for the user's sake, lets consider Source Data: PFN User Actions: 1) If interface requires phys-addr, convert to phys-addr 2) If interface allows PFN, simply pass in as-is Kernel Actions: 0) If Phys-Addr, convert to PFN 1) Validate the PFN (no bits beyond architecture length) 2) Get and validate struct page Source Data: Phys-Addr User Actions: 1) If interface requires phys-addr, simply pass in as-is 2) If interface requries PFN, convert to PFN Kernel Actions: 0) If Phys-addr, convert to PFN 1) Validate the PFN (no bits beyond arch length) 2) Get and validate struct page I think the take-away here is that the kernel has to validate the PFN regardless, and the kernel already has the information to do that. If you want to use both PFN (Idle Bit) and Phys Addr (IBS/PEBS) information, then the user needs the information to do page shifts That requires more exposure of kernel information, and is possibly complicated by things like hugepages. It seems better to simply allow the interface to take both values, and add flags which format is being used. The worst possible result of malformed input is that the wrong page is migrated. That's no different than move_phys_pages(/dev/random), and the user already requires CAP_SYS_NICE or _ADMIN, so there are probably worse things they could accomplish. Additionally, this allows the page buffer to simply use __u64*, and no need to expose a new user type for PFN. Now the interactions look like: Source Data: Phys-Addr or PFN User Actions: 0) move_phys_pages([phys-addr | pfn], MPOL_MF_(PHYS_ADDR ^ PFN)) Kernel Actions: 0) If Phys-addr, convert to PFN 1) Validate the PFN (no bits beyond arch length) 2) Get and validate struct page This only requires plumbing new 2 flags through do_pages_move, and no new user-exposed types or information. Is there an ick-factor with the idea of adding the following? MPOL_MF_PHYS_ADDR : Treat page migration addresses as physical MPOL_MF_PFN : Treat page migration addresses as PFNs > > These do not seem to be problematic from the ASLR perspective, so > I guess it may still be useful without CAP_SYS_ADMIN. > > Arnd After reviewing the capabilities documentation it seems like CAP_SYS_NICE is the appropriate capability. My last meassage I said CAP_SYS_ADMIN was probably correct, but I think using CAP_SYS_NICE is more appropriate unless there are strong feelings due to the use of PFN and Physcall Address. I'm not sure rowhammer is of great concern in this interface because you can't choose the destination address, only the destination node. Though I suppose someone could go nuts and try to "massage" a node in some way to get a statistical likelihood of placement (similar heap grooming). ~Gregory