Re: [PATCH RFC v2 00/19] fuse: fuse-over-io-uring

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Thu, 30 May 2024 15:35:38 -0400

On Thu, May 30, 2024 at 12:48:56PM -0600, Jens Axboe wrote:
> On 5/30/24 11:58 AM, Kent Overstreet wrote:
> > On Thu, May 30, 2024 at 11:28:43AM -0600, Jens Axboe wrote:
> >> I have addressed it several times in the past. tldr is that yeah the
> >> initial history of io_uring wasn't great, due to some unfortunate
> >> initial design choices (mostly around async worker setup and
> >> identities).
> > 
> > Not to pick on you too much but the initial history looked pretty messy
> > to me - a lot of layering violations - it made aio.c look clean.
> 
> Oh I certainly agree, the initial code was in a much worse state than it
> is in now. Lots of things have happened there, like splitting things up
> and adding appropriate layering. That was more of a code hygiene kind of
> thing, to make it easier to understand, maintain, and develop.
> 
> Any new subsystem is going to see lots of initial churn, regardless of
> how long it's been developed before going into upstream. We certainly
> had lots of churn, where these days it's stabilized. I don't think
> that's unusual, particularly for something that attempts to do certain
> things very differently. I would've loved to start with our current
> state, but I don't think years of being out of tree would've completely
> solved that. Some things you just don't find until it's in tree,
> unfortunately.

Well, the main thing I would've liked is a bit more discussion in the
early days of io_uring; there are things we could've done differently
back then that could've got us something cleaner in the long run.

My main complaints were always
 - yet another special purpose ringbuffer, and
 - yet another parallel syscall interface.

We've got too many of those too (aio is another), and the API
fragmentation is a real problem for userspace that just wants to be able
to issue arbitrary syscalls asynchronously. IO uring could've just been
serializing syscall numbers and arguments - that would have worked fine.

Given the history of failed AIO replacements just wanting to shove in
something working was understandable, but I don't think those would have
been big asks.

> > I'd also really like to see some more standardized mechanisms for "I'm a
> > kernel thread doing work on behalf of some other user thread" - this
> > comes up elsewhere, I'm talking with David Howells right now about
> > fsconfig which is another place it is or will be coming up.
> 
> That does exist, and it came from the io_uring side of needing exactly
> that. This is why we have create_io_thread(). IMHO it's the only sane
> way to do it, trying to guesstimate what happens deep down in a random
> callstack, and setting things up appropriately, is impossible. This is
> where most of the earlier day io_uring issues came from, and what I
> referred to as a "poor initial design choice".

Thanks, I wasn't aware of that - that's worth highlighting. I may switch
thread_with_file to that, and the fsconfig work David and I are talking
about can probably use it as well.

We really should have something lighter weight that we can use for work
items though, that's our standard mechanism for deferred work, not
spinning up a whole kthread. We do have kthread_use_mm() - there's no
reason we couldn't do an expanded version of that for all the other
shared resources that need to be available.

This was also another blocker in the other aborted AIO replacements, so
reusing clone flags does seem like the most reasonable way to make
progress here, but I would wager there's other stuff in task_struct that
really should be shared and isn't. task_struct is up to 825 (!) lines
now, which means good luck on even finding that stuff - maybe at some
point we'll have to get a giant effort going to clean up and organize
task_struct, like willy's been doing with struct page.

> >> Those have since been rectified, and the code base is
> >> stable and solid these days.
> > 
> > good tests, code coverage analysis to verify, good syzbot coverage?
> 
> 3x yes. Obviously I'm always going to say that tests could be better,
> have better coverage, cover more things, because nothing is perfect (and
> if you think it is, you're fooling yourself) and as a maintainer I want
> perfect coverage. But we're pretty diligent these days about adding
> tests for everything. And any regression or bug report always gets test
> cases written.

*nod* that's encouraging. Looking forward to the umount issue being
fixed so I can re-enable it in my tests...