Re: io_uring failure on parisc with VIPT caches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/14/23 7:12?PM, John David Anglin wrote:
> On 2023-02-14 6:29 p.m., Jens Axboe wrote:
>> On 2/14/23 4:09?PM, Helge Deller wrote:
>>> * John David Anglin<dave.anglin@xxxxxxxx>:
>>>> On 2023-02-13 5:05 p.m., Helge Deller wrote:
>>>>> On 2/13/23 22:05, Jens Axboe wrote:
>>>>>> On 2/13/23 1:59?PM, Helge Deller wrote:
>>>>>>>> Yep sounds like it. What's the caching architecture of parisc?
>>>>>>> parisc is Virtually Indexed, Physically Tagged (VIPT).
>>>>>> That's what I assumed, so virtual aliasing is what we're dealing with
>>>>>> here.
>>>>>>
>>>>>>> Thanks for the patch!
>>>>>>> Sadly it doesn't fix the problem, as the kernel still sees
>>>>>>> ctx->rings->sq.tail as being 0.
>>>>>>> Interestingly it worked once (not reproduceable) directly after bootup,
>>>>>>> which indicates that we at least look at the right address from kernel side.
>>>>>>>
>>>>>>> So, still needs more debugging/testing.
>>>>>> It's not like this is untested stuff, so yeah it'll generally be
>>>>>> correct, it just seems that parisc is a bit odd in that the virtual
>>>>>> aliasing occurs between the kernel and userspace addresses too. At least
>>>>>> that's what it seems like.
>>>>> True.
>>>>>
>>>>>> But I wonder if what needs flushing is the user side, not the kernel
>>>>>> side? Either that, or my patch is not flushing the right thing on the
>>>>>> kernel side.
>>> The patch below seems to fix the issue.
>>>
>>> I've successfuly tested it with the io_uring-test testcase on
>>> physical parisc machines with 32- and 64-bit 6.1.11 kernels.
>>>
>>> The idea is similiar on how a file is mmapped shared by two
>>> userspace processes by keeping the lower bits of the virtual address
>>> the same.
>>>
>>> Cache flushes from userspace don't seem to be needed.
>> Are they from the kernel side, if the lower bits mean we end up
>> with the same coloring? Because I think this is a bit of a big
>> hammer, in terms of overhead for flushing. As an example, on arm64
>> that is perfectly fine with the existing code, it's about a 20-25%
>> performance hit.
>
> The io_uring-test testcase still works on rp3440 with the kernel
> flushes removed.

That's what I suspected, the important bit here is just aligning it for
identical coloring. Can you confirm if the below works for you? Had to
fiddle it a bit to get it to work without coloring.


diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index db623b3185c8..1d4562067949 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -72,6 +72,7 @@
 #include <linux/io_uring.h>
 #include <linux/audit.h>
 #include <linux/security.h>
+#include <asm/shmparam.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -3200,6 +3201,51 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
 	return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
 }
 
+static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
+			unsigned long addr, unsigned long len,
+			unsigned long pgoff, unsigned long flags)
+{
+	const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
+	struct vm_unmapped_area_info info;
+	void *ptr;
+
+	ptr = io_uring_validate_mmap_request(filp, pgoff, len);
+	if (IS_ERR(ptr))
+		return -ENOMEM;
+
+	/* we do not support requesting a specific address */
+	if (addr)
+		return -EINVAL;
+
+	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+	info.length = len;
+	info.low_limit = max(PAGE_SIZE, mmap_min_addr);
+	info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
+	info.align_mask = PAGE_MASK;
+	info.align_offset = (unsigned long) ptr;
+#ifdef SHM_COLOUR
+	info.align_mask &= (SHM_COLOUR - 1);
+	info.align_offset &= (SHM_COLOUR - 1)
+#endif
+
+	/*
+	 * A failed mmap() very likely causes application failure,
+	 * so fall back to the bottom-up function here. This scenario
+	 * can happen with large stack limits and large mmap()
+	 * allocations.
+	 */
+	addr = vm_unmapped_area(&info);
+	if (offset_in_page(addr)) {
+		VM_BUG_ON(addr != -ENOMEM);
+		info.flags = 0;
+		info.low_limit = TASK_UNMAPPED_BASE;
+		info.high_limit = mmap_end;
+		addr = vm_unmapped_area(&info);
+	}
+
+	return addr;
+}
+
 #else /* !CONFIG_MMU */
 
 static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
@@ -3414,6 +3460,8 @@ static const struct file_operations io_uring_fops = {
 #ifndef CONFIG_MMU
 	.get_unmapped_area = io_uring_nommu_get_unmapped_area,
 	.mmap_capabilities = io_uring_nommu_mmap_capabilities,
+#else
+	.get_unmapped_area = io_uring_mmu_get_unmapped_area,
 #endif
 	.poll		= io_uring_poll,
 #ifdef CONFIG_PROC_FS

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux