On 2/13/23 17:15, Jens Axboe wrote:
On 2/12/23 3:31?PM, Helge Deller wrote:On 2/12/23 23:20, Helge Deller wrote:On 2/12/23 22:48, Jens Axboe wrote:On 2/12/23 1:01?PM, Helge Deller wrote:On 2/12/23 20:42, Jens Axboe wrote:On 2/12/23 12:35?PM, Helge Deller wrote:On 2/12/23 15:03, Helge Deller wrote:On 2/12/23 14:35, Jens Axboe wrote:On 2/12/23 6:28?AM, Helge Deller wrote:On 2/12/23 14:16, Jens Axboe wrote:On 2/12/23 2:47?AM, Helge Deller wrote:Hi all, We see io-uring failures on the parisc architecture with this testcase: https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel. On a 64-bit kernel (6.1.11): deller@parisc:~$ ./io_uring-test test.file ret=0, wanted 4096 Submitted=4, completed=1, bytes=0 -> failure strace shows: io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3 mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000 mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000 openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4 statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0 getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4 brk(NULL) = 0x4ae000 brk(0x4cf000) = 0x4cf000 io_uring_enter(3, 4, 0, 0, NULL, 8) = 0 Running the same testcase on a 32-bit kernel (6.1.11) works: root@debian:~# ./io_uring-test test.file Submitted=4, completed=4, bytes=16384 -> ok. strace: io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3 mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000 mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000 openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4 statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0 getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4 brk(NULL) = 0x15000 brk(0x36000) = 0x36000 io_uring_enter(3, 4, 0, 0, NULL, 8) = 4 I'm happy to test any patch if someone has an idea....No idea what this could be, to be honest. I tried your qemu vm image, and it does boot, but it's missing keys to be able to update apt and install packages... After fiddling with this for 30 min I gave up, any chance you can update the sid image? Given how slow this thing is running, it'd take me all day to do a fresh install and I have to admit I'm not THAT motivated about parisc to do that :)Yes, I will update that image, but qemu currently only supports a 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I update it, you won't be able to reproduce it, as it only happens with the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit translation in the kernel, which triggers only big-endian machines.I built my own kernel for it, so that should be fine, correct?No, as qemu won't boot the 64-bit kernel.We'll see soon enough, managed to disable enough checks on the debian-10 image to actually make it install packages.Does powerpc with a 64-bit ppc64 kernel work? I'd assume it will show the same issue.No idea... Only stuff I use and test on is x86-64/32 and arm64.Would be interesting if someone could test...I will try to add some printks and compare the output of 32- and 64-bit kernels. If you have some suggestion where to add such (which?) debug code, it would help me a lot.I'd just try: echo 1 > /sys/kernel/debug/tracing/events/io_uringI'll try, but will take some time...At entry of io_submit_sqes(), io_sqring_entries() returns 0, because ctx->rings->sq.tail is 0 (wrongly on broken 64-bit, but ok value 4 on 32-bit), and ctx->cached_sq_head is 0 in both cases.cached_sq_head will get updated as sqes are consumed, but since sq.tail is zero, there's nothing to submit as far as io_uring is concerned. Can you dump addresses/offsets of the sq and cq heads/tails in userspace and in the kernel? They are u32, so same size of 32 and 64-bit.For both kernels (32- and 64-bit) I get: p->sq_off.head = 0 p->sq_off.tail = 16 p->cq_off.head = 32 p->cq_off.tail = 48So all that looks as expected. Is it perhaps some mmap thing on 64-bit kernels? The kernel isn't seeing the updates. You could add the below debugging, and keep your kernel side stuff. Sounds like they don't quite agree. diff --git a/examples/io_uring-test.c b/examples/io_uring-test.c index 1a685360bff6..f1cfda90c018 100644 --- a/examples/io_uring-test.c +++ b/examples/io_uring-test.c @@ -73,7 +73,9 @@ int main(int argc, char *argv[]) break; } while (1); + printf("pre-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail); ret = io_uring_submit(&ring); + printf("post-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail); if (ret < 0) { fprintf(stderr, "io_uring_submit: %s\n", strerror(-ret)); return 1;Result is: pre-submit sq head/tail 0/0, 0/4 .. post-submit sq head/tail 0/4, 4/4I have to correct myself! The problem exists on both, 32- and 64-bit kernels. My current testing with 32-bit kernel was on qemu, but after booting the same kernel on a physical box, I see the testcase failing on the 32-bit kernel too. So, probably some cache-flushing / alias-handling is needed.... This is a 1-CPU box, so SMP isn't involved.Yep sounds like it. What's the caching architecture of parisc?
parisc is Virtually Indexed, Physically Tagged (VIPT).
Something like this perhaps, but may not be complete as we'd need to do something for the cqe writes too I think, not just the cq tail. diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index db623b3185c8..ab0d1297bb0e 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2338,6 +2338,8 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx) unsigned head, mask = ctx->sq_entries - 1; unsigned sq_idx = ctx->cached_sq_head++ & mask; + flush_dcache_page(virt_to_page(ctx->sq_array + sq_idx)); + /* * The cached sq head (or cq tail) serves two purposes: * diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index ab4b2a1c3b7e..b132f44a9364 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -220,6 +220,7 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx) { /* order cqe stores with ring update */ smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail); + flush_dcache_page(virt_to_page(ctx->rings)); } /* requires smb_mb() prior, see wq_has_sleeper() */
Thanks for the patch! Sadly it doesn't fix the problem, as the kernel still sees ctx->rings->sq.tail as being 0. Interestingly it worked once (not reproduceable) directly after bootup, which indicates that we at least look at the right address from kernel side. So, still needs more debugging/testing. Helge