* John David Anglin <dave.anglin@xxxxxxxx>: > On 2023-02-13 5:05 p.m., Helge Deller wrote: > > On 2/13/23 22:05, Jens Axboe wrote: > > > On 2/13/23 1:59?PM, Helge Deller wrote: > > > > > Yep sounds like it. What's the caching architecture of parisc? > > > > > > > > parisc is Virtually Indexed, Physically Tagged (VIPT). > > > > > > That's what I assumed, so virtual aliasing is what we're dealing with > > > here. > > > > > > > Thanks for the patch! > > > > Sadly it doesn't fix the problem, as the kernel still sees > > > > ctx->rings->sq.tail as being 0. > > > > Interestingly it worked once (not reproduceable) directly after bootup, > > > > which indicates that we at least look at the right address from kernel side. > > > > > > > > So, still needs more debugging/testing. > > > > > > It's not like this is untested stuff, so yeah it'll generally be > > > correct, it just seems that parisc is a bit odd in that the virtual > > > aliasing occurs between the kernel and userspace addresses too. At least > > > that's what it seems like. > > > > True. > > > > > But I wonder if what needs flushing is the user side, not the kernel > > > side? Either that, or my patch is not flushing the right thing on the > > > kernel side. The patch below seems to fix the issue. I've successfuly tested it with the io_uring-test testcase on physical parisc machines with 32- and 64-bit 6.1.11 kernels. The idea is similiar on how a file is mmapped shared by two userspace processes by keeping the lower bits of the virtual address the same. Cache flushes from userspace don't seem to be needed. I think similiar code is needed for mips (uses SHMLBA 0x40000) and some other architectures.... Helge >From efde7ed7ad380a924448b8ab8ea30d52782aa8e6 Mon Sep 17 00:00:00 2001 From: Helge Deller <deller@xxxxxx> Date: Tue, 14 Feb 2023 23:41:14 +0100 Subject: [PATCH] io_uring: DRAFT Fix io_uring on machines with VIPT caches This is a DRAFT patch to fix io_uring to function on machines with VIPT caches (like PA-RISC). It will currently only compile on parisc, because of the usage of the SHM_COLOUR constant. Basic idea is to ensure that the page colour matches between the kernel ring address and mmap'ed userspace address and by flushing the caches before accessing the rings. Signed-off-by: Helge Deller <deller@xxxxxx> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 862e05e6691d..606e23671453 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2208,6 +2208,8 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx) unsigned head, mask = ctx->sq_entries - 1; unsigned sq_idx = ctx->cached_sq_head++ & mask; + flush_dcache_page(virt_to_page(ctx->sq_array + sq_idx)); + /* * The cached sq head (or cq tail) serves two purposes: * @@ -2238,6 +2240,9 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) unsigned int left; int ret; + struct io_rings *rings = ctx->rings; + flush_dcache_page(virt_to_page(rings)); + if (unlikely(!entries)) return 0; /* make sure SQ entry isn't read before tail */ @@ -3059,6 +3067,51 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot); } +unsigned long +io_uring_mmu_get_unmapped_area(struct file *filp, unsigned long addr, + unsigned long len, unsigned long pgoff, + unsigned long flags) +{ + struct mm_struct *mm = current->mm; + const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); + struct vm_unmapped_area_info info; + void *ptr; + + ptr = io_uring_validate_mmap_request(filp, pgoff, len); + if (IS_ERR(ptr)) + return -ENOMEM; + + + /* we do not support requesting a specific address */ + if (addr) + return -EINVAL; + + info.flags = VM_UNMAPPED_AREA_TOPDOWN; + info.length = len; + info.low_limit = max(PAGE_SIZE, mmap_min_addr); + info.high_limit = arch_get_mmap_base(addr, mm->mmap_base); + info.align_mask = PAGE_MASK & (SHM_COLOUR - 1); + info.align_offset = (unsigned long)ptr & (SHM_COLOUR - 1); + + addr = vm_unmapped_area(&info); + + /* + * A failed mmap() very likely causes application failure, + * so fall back to the bottom-up function here. This scenario + * can happen with large stack limits and large mmap() + * allocations. + */ + if (offset_in_page(addr)) { + VM_BUG_ON(addr != -ENOMEM); + info.flags = 0; + info.low_limit = TASK_UNMAPPED_BASE; + info.high_limit = mmap_end; + addr = vm_unmapped_area(&info); + } + + return addr; +} + #else /* !CONFIG_MMU */ static int io_uring_mmap(struct file *file, struct vm_area_struct *vma) @@ -3273,6 +3326,8 @@ static const struct file_operations io_uring_fops = { #ifndef CONFIG_MMU .get_unmapped_area = io_uring_nommu_get_unmapped_area, .mmap_capabilities = io_uring_nommu_mmap_capabilities, +#else + .get_unmapped_area = io_uring_mmu_get_unmapped_area, #endif .poll = io_uring_poll, #ifdef CONFIG_PROC_FS diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 90b675c65b84..b8bc682ef240 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -204,6 +204,7 @@ static inline void io_ring_submit_lock(struct io_ring_ctx *ctx, static inline void io_commit_cqring(struct io_ring_ctx *ctx) { /* order cqe stores with ring update */ + flush_dcache_page(virt_to_page(ctx->rings)); smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail); }