Re: [GIT PULL] io_uring xattr support

Jens Axboe <axboe@xxxxxxxxx> · Mon, 23 May 2022 13:59:14 -0600

On 5/23/22 1:41 PM, Linus Torvalds wrote:
> On Sun, May 22, 2022 at 2:26 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>
>> On top of the core io_uring changes, this pull request includes support
>> for the xattr variants.
> 
> So I don't mind the code (having seen the earlier versions), but
> looking at this all I *do* end up reacting to this part:
> 
>     [torvalds@ryzen linux]$ wc -l fs/io_uring.c
>     12744 fs/io_uring.c
> 
> and no, this is not due to this xattr pull, but the xattr code did add
> another few hundred lines of "io_uring command boilerplate for another
> command" to this file that is a nasty file from hell.

I know, it really is a monster file at this point...

> I really think that it might be time to start thinking about splitting
> that io_uring.c file up. Make it a directory, and have the core
> command engine in io_uring/core.c, and then have the different actual
> IO_URING_OP_xyz handling in separate files.

I've been pondering that for a while actually, and yes I agree it's time
to organize this a bit differently. When you are in this code all the
time you notice less as you know where everything is, but it would be
nice to take the time to split it into some manageable and separately
readable/maintainable pieces.

> And yes, that would probably necessitate making the OP handling use
> more of a dispatch table approach, but wouldn't that be good anyway?
> That io_uring.c file is starting to have a lot of *big* switch
> statements for the different cases.
> 
> Wouldn't it be nice to have a "op descriptor array" instead of the
> 
>         switch (req->opcode) {
>         ...
>         case IORING_OP_WRITE:
>                 return io_prep_rw(req, sqe);
>         ...
> 
> kind of tables?
> 
> Yes, the compiler may end up generating a binary-tree
> compare-and-branch thing for a switch like that, and it might be
> better than an indirect branch in these days of spectre costs for
> branch prediction safety, but if we're talking a few tens of cycles
> per op, that's probably not really a big deal.

I was resistant to the indirect function call initially because of the
spectre overhead, but that was when the table was a lot smaller. The
tides may indeed have shifted on this now that the table has grown to
the size that it has. Plus we have both a prep handler and issue handler
for each, so you end up with two massive switches to deal with that.

> And from a maintenenace standpoint, I really think it would be good to
> try to try to walk away from those "case IORING_OP_xyz" things, and
> try to split things up into more manageable pieces.
> 
> Hmm?

As mentioned, it's something that I have myself been thinking about for
the past few releases. It's not difficult work and can be done in a
sequential kind of manner, but it will add some pain in terms of
backports. Nothing _really_ major, but... Longer term it'll be nicer for
sure, which is the most important bit.

I've got some ideas on how to split the core bits, and the
related-op-per file kind of idea for the rest makes sense. Eg net
related bits can go in one, or maybe we can even go finer grained and
(almost) do per-op.

I'll spend some time after the merge window to try and get this sorted
out.

-- 
Jens Axboe