Hello, Dave. On Tue, May 05, 2020 at 04:41:14PM +1000, Dave Chinner wrote: > > OTOH I don't have a great idea how the generic infrastructure should look > > like... > > I haven't given it any thought - it's not something I have any > bandwidth to spend time on. I'll happily review a unified > generic cgroup-aware kthread-based IO dispatch mechanism, but I > don't have the time to design and implement that myself.... > > OTOH, I will make time to stop people screwing up filesystems and > block devices with questionable complexity and unique, storage > device dependent userspace visible error behaviour. This sort of > change is objectively worse for users than not supporting the > functionality in the first place. That probably is too strong a position to hold without spending at least some thoughts on a subject, whatever the subject may be, and it doesn't seem like your understanding of userspace implications is accurate. I don't necessarily disagree that it'd be nice to have a common infrastructure and there may be some part which can actually be factored out. However, there isn't gonna be a magic bullet which magically makes every IO thing in the kernel cgroup aware automatically. Please consider the followings. * Avoding IO priority inversions requires splitting IO channels according to cgroups and working around (e.g. with backcharging) when they can't be. It's a substantial feature which may require substantial changes. Each IO subsystem has different constraints and existing structures and many of them would require their own solutions. It's not different from different filesystems needing their own solutions for similar problems. * Because different filesystems and IO stacking layers already have their own internal infrastructure, the right way to add cgroup support is adapting to and modifying the existing infrastructure rather than trying to restructure them to use the same cgroup mechanism, which I don't think would be possible in many cases. * Among the three IO stacking / redirecting mechanisms - md/dm, loop and fuse - the requirements and what's possible vary quite a bit. md/dm definitely need to support full-on IO channel splitting cgroup support. loop can go either way, but given existing uses, full splitting makes a sense. fuse, as it currently stands, can't support that because the priority inversions extend all the way to userspace and the kernel API isn't built for that. If it wants to support cgroup containment, each instance would have to be assigned to a cgroup. Between dm/md and loop, it's maybe possible that some of the sub-threading code can be reused, but I don't see a point in blocking loop updates given that it clearly fixes userspace visible malfunctions, is not that much code and how the shared code should look is unclear yet. We'll be able to answer the sharing question when we actually get to dm/md conversion. Thanks. -- tejun