On 2/14/23 7:12?PM, John David Anglin wrote: > On 2023-02-14 6:29 p.m., Jens Axboe wrote: >> On 2/14/23 4:09?PM, Helge Deller wrote: >>> * John David Anglin<dave.anglin@xxxxxxxx>: >>>> On 2023-02-13 5:05 p.m., Helge Deller wrote: >>>>> On 2/13/23 22:05, Jens Axboe wrote: >>>>>> On 2/13/23 1:59?PM, Helge Deller wrote: >>>>>>>> Yep sounds like it. What's the caching architecture of parisc? >>>>>>> parisc is Virtually Indexed, Physically Tagged (VIPT). >>>>>> That's what I assumed, so virtual aliasing is what we're dealing with >>>>>> here. >>>>>> >>>>>>> Thanks for the patch! >>>>>>> Sadly it doesn't fix the problem, as the kernel still sees >>>>>>> ctx->rings->sq.tail as being 0. >>>>>>> Interestingly it worked once (not reproduceable) directly after bootup, >>>>>>> which indicates that we at least look at the right address from kernel side. >>>>>>> >>>>>>> So, still needs more debugging/testing. >>>>>> It's not like this is untested stuff, so yeah it'll generally be >>>>>> correct, it just seems that parisc is a bit odd in that the virtual >>>>>> aliasing occurs between the kernel and userspace addresses too. At least >>>>>> that's what it seems like. >>>>> True. >>>>> >>>>>> But I wonder if what needs flushing is the user side, not the kernel >>>>>> side? Either that, or my patch is not flushing the right thing on the >>>>>> kernel side. >>> The patch below seems to fix the issue. >>> >>> I've successfuly tested it with the io_uring-test testcase on >>> physical parisc machines with 32- and 64-bit 6.1.11 kernels. >>> >>> The idea is similiar on how a file is mmapped shared by two >>> userspace processes by keeping the lower bits of the virtual address >>> the same. >>> >>> Cache flushes from userspace don't seem to be needed. >> Are they from the kernel side, if the lower bits mean we end up >> with the same coloring? Because I think this is a bit of a big >> hammer, in terms of overhead for flushing. As an example, on arm64 >> that is perfectly fine with the existing code, it's about a 20-25% >> performance hit. > > The io_uring-test testcase still works on rp3440 with the kernel > flushes removed. That's what I suspected, the important bit here is just aligning it for identical coloring. Can you confirm if the below works for you? Had to fiddle it a bit to get it to work without coloring. diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index db623b3185c8..1d4562067949 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -72,6 +72,7 @@ #include <linux/io_uring.h> #include <linux/audit.h> #include <linux/security.h> +#include <asm/shmparam.h> #define CREATE_TRACE_POINTS #include <trace/events/io_uring.h> @@ -3200,6 +3201,51 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot); } +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp, + unsigned long addr, unsigned long len, + unsigned long pgoff, unsigned long flags) +{ + const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); + struct vm_unmapped_area_info info; + void *ptr; + + ptr = io_uring_validate_mmap_request(filp, pgoff, len); + if (IS_ERR(ptr)) + return -ENOMEM; + + /* we do not support requesting a specific address */ + if (addr) + return -EINVAL; + + info.flags = VM_UNMAPPED_AREA_TOPDOWN; + info.length = len; + info.low_limit = max(PAGE_SIZE, mmap_min_addr); + info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base); + info.align_mask = PAGE_MASK; + info.align_offset = (unsigned long) ptr; +#ifdef SHM_COLOUR + info.align_mask &= (SHM_COLOUR - 1); + info.align_offset &= (SHM_COLOUR - 1) +#endif + + /* + * A failed mmap() very likely causes application failure, + * so fall back to the bottom-up function here. This scenario + * can happen with large stack limits and large mmap() + * allocations. + */ + addr = vm_unmapped_area(&info); + if (offset_in_page(addr)) { + VM_BUG_ON(addr != -ENOMEM); + info.flags = 0; + info.low_limit = TASK_UNMAPPED_BASE; + info.high_limit = mmap_end; + addr = vm_unmapped_area(&info); + } + + return addr; +} + #else /* !CONFIG_MMU */ static int io_uring_mmap(struct file *file, struct vm_area_struct *vma) @@ -3414,6 +3460,8 @@ static const struct file_operations io_uring_fops = { #ifndef CONFIG_MMU .get_unmapped_area = io_uring_nommu_get_unmapped_area, .mmap_capabilities = io_uring_nommu_mmap_capabilities, +#else + .get_unmapped_area = io_uring_mmu_get_unmapped_area, #endif .poll = io_uring_poll, #ifdef CONFIG_PROC_FS -- Jens Axboe