Hi Vlastimil,
On Wed, Nov 20, 2024 at 9:47 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
On 11/20/24 09:19, Geert Uytterhoeven wrote:
On Tue, Nov 19, 2024 at 11:30 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
On 11/19/24 2:46 PM, Guenter Roeck wrote:
On 11/19/24 11:49, Jens Axboe wrote:
On 11/19/24 12:44 PM, Jens Axboe wrote:
On Tue, Nov 19, 2024 at 8:30?PM Jens Axboe <axboe@xxxxxxxxx> wrote:
On 11/19/24 12:25 PM, Geert Uytterhoeven wrote:
On Tue, Nov 19, 2024 at 8:10?PM Jens Axboe <axboe@xxxxxxxxx> wrote:
On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@xxxxxxxxx> wrote:
On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
On 11/19/24 08:02, Jens Axboe wrote:
On 11/19/24 8:36 AM, Guenter Roeck wrote:
On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
Doesn't matter right now as there's still some bytes left for it, but
let's prepare for the io_kiocb potentially growing and add a specific
freeptr offset for it.
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
This patch triggers:
Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
Stack from 00c63e5c:
00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
Call Trace: [<004b9044>] dump_stack+0xc/0x10
[<004ae21e>] panic+0xc4/0x252
[<000c6974>] __kmem_cache_create_args+0x216/0x26c
[<004a72c2>] strcpy+0x0/0x1c
[<0002cb62>] parse_args+0x0/0x1f2
[<000c675e>] __kmem_cache_create_args+0x0/0x26c
[<004adb58>] memset+0x0/0x8c
[<0076f28a>] io_uring_init+0x4c/0xca
[<0076f23e>] io_uring_init+0x0/0xca
[<000020e0>] do_one_initcall+0x32/0x192
[<0076f23e>] io_uring_init+0x0/0xca
[<0000211c>] do_one_initcall+0x6e/0x192
[<004a72c2>] strcpy+0x0/0x1c
[<0002cb62>] parse_args+0x0/0x1f2
[<000020ae>] do_one_initcall+0x0/0x192
[<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
[<0076f23e>] io_uring_init+0x0/0xca
[<004b911a>] kernel_init+0x0/0xec
[<004b912e>] kernel_init+0x14/0xec
[<004b911a>] kernel_init+0x0/0xec
[<0000252c>] ret_from_kernel_thread+0xc/0x14
when trying to boot the m68k:q800 machine in qemu.
An added debug message in create_cache() shows the reason:
#### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
freeptr_offset would need to be 4-byte aligned but that is not the
case on m68k.
Why is ->work 2-byte aligned to begin with on m68k?!
My understanding is that m68k does not align pointers.
The minimum alignment for multi-byte integral values on m68k is
2 bytes.
See also the comment at
https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
Maybe it's time we put m68k to bed? :-)
We can add a forced alignment ->work to be 4 bytes, won't change
anything on anything remotely current. But does feel pretty hacky to
need to align based on some ancient thing.
Why does freeptr_offset need to be 4-byte aligned?
Didn't check, but it's slab/slub complaining using a 2-byte aligned
address for the free pointer offset. It's explicitly checking:
/* If a custom freelist pointer is requested make sure it's sane. */
err = -EINVAL;
if (args->use_freeptr_offset &&
(args->freeptr_offset >= object_size ||
!(flags & SLAB_TYPESAFE_BY_RCU) ||
!IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
^^^^^^
goto out;
It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
(free_ptr is sort of a long). If freeptr_offset must be a multiple of
4 or 8 bytes,
the code that assigns it must make sure that is true.
Right, this is what the email is about...
I guess this is the code in fs/file_table.c:
.freeptr_offset = offsetof(struct file, f_freeptr),
which references:
include/linux/fs.h: freeptr_t f_freeptr;
I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
(or __aligned(sizeof(long)) to the definition of freeptr_t:
include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;
It's not, it's struct io_kiocb->work, as per the stack trace in this
email.
Sorry, I was falling out of thin air into this thread...
linux-next/master:io_uring/io_uring.c: .freeptr_offset =
offsetof(struct io_kiocb, work),
linux-next/master:io_uring/io_uring.c: .use_freeptr_offset = true,
Apparently io_kiocb.work is of type struct io_wq_work, not freeptr_t?
Isn't that a bit error-prone, as the slab core code expects a freeptr_t?
It just needs the space, should not matter otherwise. But may as well
just add the union and align the freeptr so it stop complaining on m68k.
Ala the below, perhaps alignment takes care of itself then?
No, that doesn't work (I tried), at least not on its own, because the pointer
is still unaligned on m68k.
Yeah we'll likely need to force it. The below should work, I pressume?
Feels pretty odd to have to align it to the size of it, when that should
naturally occur... Crusty legacy archs.
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 593c10a02144..8ed9c6923668 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -674,7 +674,11 @@ struct io_kiocb {
struct io_kiocb *link;
/* custom credentials, valid IFF REQ_F_CREDS is set */
const struct cred *creds;
- struct io_wq_work work;
+
+ union {
+ struct io_wq_work work;
+ freeptr_t freeptr __aligned(sizeof(freeptr_t));
I'd rather add the __aligned() to the definition of freeptr_t, so it
applies to all (future) users.
But my main question stays: why is the slab code checking
IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t)?
I believe it's to match how SLUB normally calculates the offset if no
explicit one is given, in calculate_sizes():
s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *));
Yes there's a sizeof(void *) because freepointer used to be just that and we
forgot to update this place when freepointer_t was introduced (by Jann in
44f6a42d49350) for handling CONFIG_SLAB_FREELIST_HARDENED. In
get_freepointer() you can see how there's a cast to a pointer eventually.
Does m68k have different alignment for pointer and unsigned long or both are
2 bytes? Or any other arch, i.e. should get_freepointer be a union with
unsigned long and void * instead? (or it doesn't matter?)
The default alignment for int, long, and pointer is 2 on m68k.
On CRIS (no longer supported by Linux), it was 1, IIRC.
So the union won't make a difference.
Perhaps that was just intended to be __alignof__ instead of sizeof()?
Would it do the right thing everywhere, given the explanation above?
It depends. Does anything rely on the offset being a multiple of (at
least) 4?
E.g. does anything counts in multiples of longs (hi BCPL! ;-), or are
the 2 LSB used for a special purpose? (cfr. maple_tree, which uses
bit 0 (https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46)?
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds