Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/13/23 17:15, Jens Axboe wrote:
On 2/12/23 3:31?PM, Helge Deller wrote:
On 2/12/23 23:20, Helge Deller wrote:
On 2/12/23 22:48, Jens Axboe wrote:
On 2/12/23 1:01?PM, Helge Deller wrote:
On 2/12/23 20:42, Jens Axboe wrote:
On 2/12/23 12:35?PM, Helge Deller wrote:
On 2/12/23 15:03, Helge Deller wrote:
On 2/12/23 14:35, Jens Axboe wrote:
On 2/12/23 6:28?AM, Helge Deller wrote:
On 2/12/23 14:16, Jens Axboe wrote:
On 2/12/23 2:47?AM, Helge Deller wrote:
Hi all,

We see io-uring failures on the parisc architecture with this testcase:
https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c

parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.

On a 64-bit kernel (6.1.11):
deller@parisc:~$ ./io_uring-test test.file
ret=0, wanted 4096
Submitted=4, completed=1, bytes=0
-> failure

strace shows:
io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
brk(NULL)                               = 0x4ae000
brk(0x4cf000)                           = 0x4cf000
io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0


Running the same testcase on a 32-bit kernel (6.1.11) works:
root@debian:~# ./io_uring-test test.file
Submitted=4, completed=4, bytes=16384
-> ok.

strace:
io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
brk(NULL)                               = 0x15000
brk(0x36000)                            = 0x36000
io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4

I'm happy to test any patch if someone has an idea....

No idea what this could be, to be honest. I tried your qemu vm image,
and it does boot, but it's missing keys to be able to update apt and
install packages... After fiddling with this for 30 min I gave up, any
chance you can update the sid image? Given how slow this thing is
running, it'd take me all day to do a fresh install and I have to admit
I'm not THAT motivated about parisc to do that :)

Yes, I will update that image, but qemu currently only supports a
32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
update it, you won't be able to reproduce it, as it only happens with
the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
translation in the kernel, which triggers only big-endian machines.

I built my own kernel for it, so that should be fine, correct?

No, as qemu won't boot the 64-bit kernel.

We'll see soon enough, managed to disable enough checks on the
debian-10 image to actually make it install packages.

Does powerpc with a 64-bit ppc64 kernel work?
I'd assume it will show the same issue.

No idea... Only stuff I use and test on is x86-64/32 and arm64.

Would be interesting if someone could test...

I will try to add some printks and compare the output of 32- and
64-bit kernels. If you have some suggestion where to add such (which?)
debug code, it would help me a lot.

I'd just try:

echo 1 > /sys/kernel/debug/tracing/events/io_uring

I'll try, but will take some time...


At entry of io_submit_sqes(), io_sqring_entries() returns 0, because
ctx->rings->sq.tail is 0 (wrongly on broken 64-bit, but ok value 4 on 32-bit), and
ctx->cached_sq_head is 0 in both cases.

cached_sq_head will get updated as sqes are consumed, but since sq.tail
is zero, there's nothing to submit as far as io_uring is concerned.

Can you dump addresses/offsets of the sq and cq heads/tails in userspace
and in the kernel? They are u32, so same size of 32 and 64-bit.

For both kernels (32- and 64-bit) I get:
p->sq_off.head = 0  p->sq_off.tail = 16
p->cq_off.head = 32  p->cq_off.tail = 48

So all that looks as expected. Is it perhaps some mmap thing on 64-bit
kernels? The kernel isn't seeing the updates. You could add the below
debugging, and keep your kernel side stuff. Sounds like they don't quite
agree.


diff --git a/examples/io_uring-test.c b/examples/io_uring-test.c
index 1a685360bff6..f1cfda90c018 100644
--- a/examples/io_uring-test.c
+++ b/examples/io_uring-test.c
@@ -73,7 +73,9 @@ int main(int argc, char *argv[])
               break;
       } while (1);

+    printf("pre-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
       ret = io_uring_submit(&ring);
+    printf("post-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
       if (ret < 0) {
           fprintf(stderr, "io_uring_submit: %s\n", strerror(-ret));
           return 1;

Result is:
pre-submit sq head/tail 0/0, 0/4
..
post-submit sq head/tail 0/4, 4/4

I have to correct myself!
The problem exists on both, 32- and 64-bit kernels.
My current testing with 32-bit kernel was on qemu, but after booting the same kernel
on a physical box, I see the testcase failing on the 32-bit kernel too.

So, probably some cache-flushing / alias-handling is needed....
This is a 1-CPU box, so SMP isn't involved.

Yep sounds like it. What's the caching architecture of parisc?

parisc is Virtually Indexed, Physically Tagged (VIPT).


Something
like this perhaps, but may not be complete as we'd need to do something
for the cqe writes too I think, not just the cq tail.

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index db623b3185c8..ab0d1297bb0e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2338,6 +2338,8 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
  	unsigned head, mask = ctx->sq_entries - 1;
  	unsigned sq_idx = ctx->cached_sq_head++ & mask;

+	flush_dcache_page(virt_to_page(ctx->sq_array + sq_idx));
+
  	/*
  	 * The cached sq head (or cq tail) serves two purposes:
  	 *
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index ab4b2a1c3b7e..b132f44a9364 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -220,6 +220,7 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
  {
  	/* order cqe stores with ring update */
  	smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail);
+	flush_dcache_page(virt_to_page(ctx->rings));
  }

  /* requires smb_mb() prior, see wq_has_sleeper() */

Thanks for the patch!
Sadly it doesn't fix the problem, as the kernel still sees
ctx->rings->sq.tail as being 0.
Interestingly it worked once (not reproduceable) directly after bootup,
which indicates that we at least look at the right address from kernel side.

So, still needs more debugging/testing.

Helge




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux