Hi Matoro, * matoro <matoro_mailinglist_kernel@xxxxxxxxx>: > On 2023-03-14 13:16, Jens Axboe wrote: > > From: Helge Deller <deller@xxxxxx> > > > > Some architectures have memory cache aliasing requirements (e.g. parisc) > > if memory is shared between userspace and kernel. This patch fixes the > > kernel to return an aliased address when asked by userspace via mmap(). > > > > Signed-off-by: Helge Deller <deller@xxxxxx> > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > --- > > io_uring/io_uring.c | 51 +++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 51 insertions(+) > > > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > > index 722624b6d0dc..3adecebbac71 100644 > > --- a/io_uring/io_uring.c > > +++ b/io_uring/io_uring.c > > @@ -72,6 +72,7 @@ > > #include <linux/io_uring.h> > > #include <linux/audit.h> > > #include <linux/security.h> > > +#include <asm/shmparam.h> > > > > #define CREATE_TRACE_POINTS > > #include <trace/events/io_uring.h> > > @@ -3317,6 +3318,54 @@ static __cold int io_uring_mmap(struct file > > *file, struct vm_area_struct *vma) > > return remap_pfn_range(vma, vma->vm_start, pfn, sz, > > vma->vm_page_prot); > > } > > > > +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp, > > + unsigned long addr, unsigned long len, > > + unsigned long pgoff, unsigned long flags) > > +{ > > + const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); > > + struct vm_unmapped_area_info info; > > + void *ptr; > > + > > + /* > > + * Do not allow to map to user-provided address to avoid breaking the > > + * aliasing rules. Userspace is not able to guess the offset address > > of > > + * kernel kmalloc()ed memory area. > > + */ > > + if (addr) > > + return -EINVAL; > > + > > + ptr = io_uring_validate_mmap_request(filp, pgoff, len); > > + if (IS_ERR(ptr)) > > + return -ENOMEM; > > + > > + info.flags = VM_UNMAPPED_AREA_TOPDOWN; > > + info.length = len; > > + info.low_limit = max(PAGE_SIZE, mmap_min_addr); > > + info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base); > > +#ifdef SHM_COLOUR > > + info.align_mask = PAGE_MASK & (SHM_COLOUR - 1UL); > > +#else > > + info.align_mask = PAGE_MASK & (SHMLBA - 1UL); > > +#endif > > + info.align_offset = (unsigned long) ptr; > > + > > + /* > > + * A failed mmap() very likely causes application failure, > > + * so fall back to the bottom-up function here. This scenario > > + * can happen with large stack limits and large mmap() > > + * allocations. > > + */ > > + addr = vm_unmapped_area(&info); > > + if (offset_in_page(addr)) { > > + info.flags = 0; > > + info.low_limit = TASK_UNMAPPED_BASE; > > + info.high_limit = mmap_end; > > + addr = vm_unmapped_area(&info); > > + } > > + > > + return addr; > > +} > > + > > #else /* !CONFIG_MMU */ > > > > static int io_uring_mmap(struct file *file, struct vm_area_struct *vma) > > @@ -3529,6 +3578,8 @@ static const struct file_operations io_uring_fops > > = { > > #ifndef CONFIG_MMU > > .get_unmapped_area = io_uring_nommu_get_unmapped_area, > > .mmap_capabilities = io_uring_nommu_mmap_capabilities, > > +#else > > + .get_unmapped_area = io_uring_mmu_get_unmapped_area, > > #endif > > .poll = io_uring_poll, > > #ifdef CONFIG_PROC_FS > > Hi Jens, Helge - I've bisected a regression with io_uring on ia64 to this > patch in 6.4. Unfortunately this breaks userspace programs using io_uring, > the easiest one to test is cmake with an io_uring enabled libuv (i.e., libuv > >= 1.45.0) which will hang. > > I am aware that ia64 is in a vulnerable place right now which I why I am > keeping this spread limited. Since this clearly involves > architecture-specific changes for parisc, it isn't so much architecture-specific... (just one ifdef) > is there any chance of looking at > what is required to do the same for ia64? I looked at > 0ef36bd2b37815719e31a72d2beecc28ca8ecd26 ("parisc: change value of SHMLBA > from 0x00400000 to PAGE_SIZE") and tried to replicate the SHMLBA -> > SHM_COLOUR change, but it made no difference. > > If hardware is necessary for testing, I can provide it, including remote BMC > access for restarts/kernel debugging. Any takers? I won't have time to test myself, but maybe you could test? Basically we should try to find out why io_uring_mmu_get_unmapped_area() doesn't return valid addresses, while arch_get_unmapped_area() [in arch/ia64/kernel/sys_ia64.c] does. You could apply this patch first: It introduces a memory leak (as it requests memory twice), but maybe we get an idea? The ia64 arch_get_unmapped_area() searches for memory from bottom (flags=0), while io_uring function tries top-down first. Maybe that's the problem. And I don't understand the offset_in_page() check right now. diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3bca7a79efda..93b1964d2bbb 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3431,13 +3431,17 @@ static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp, * can happen with large stack limits and large mmap() * allocations. */ +/* compare to arch_get_unmapped_area() in arch/ia64/kernel/sys_ia64.c */ addr = vm_unmapped_area(&info); - if (offset_in_page(addr)) { +printk("io_uring_mmu_get_unmapped_area() address 1 is: %px\n", addr); + addr = NULL; + if (!addr) { info.flags = 0; info.low_limit = TASK_UNMAPPED_BASE; info.high_limit = mmap_end; addr = vm_unmapped_area(&info); } +printk("io_uring_mmu_get_unmapped_area() returns address %px\n", addr); return addr; } Another option is to disable the call to io_uring_nommu_get_unmapped_area()) with the next patch. Maybe you could add printks() to ia64's arch_get_unmapped_area() and check what it returns there? @@ -3654,6 +3658,8 @@ static const struct file_operations io_uring_fops = { #ifndef CONFIG_MMU .get_unmapped_area = io_uring_nommu_get_unmapped_area, .mmap_capabilities = io_uring_nommu_mmap_capabilities, +#elif 0 /* IS_ENABLED(CONFIG_IA64) */ + .get_unmapped_area = NULL, #else .get_unmapped_area = io_uring_mmu_get_unmapped_area, #endif Helge