On Wed, Jan 22, 2020 at 10:04:44AM -0700, Jens Axboe wrote: > On 1/22/20 9:54 AM, Jerome Glisse wrote: > > On Wed, Jan 22, 2020 at 08:12:51AM -0700, Jens Axboe wrote: > >> On 1/22/20 4:59 AM, Michal Hocko wrote: > >>> On Tue 21-01-20 20:57:23, Jerome Glisse wrote: > >>>> We can also discuss what kind of knobs we want to expose so that > >>>> people can decide to choose the tradeof themself (ie from i want low > >>>> latency io-uring and i don't care wether mm can not do its business; to > >>>> i want mm to never be impeded in its business and i accept the extra > >>>> latency burst i might face in io operations). > >>> > >>> I do not think it is a good idea to make this configurable. How can > >>> people sensibly choose between the two without deep understanding of > >>> internals? > >> > >> Fully agree, we can't just punt this to a knob and call it good, that's > >> a typical fallacy of core changes. And there is only one mode for > >> io_uring, and that's consistent low latency. If this change introduces > >> weird reclaim, compaction or migration latencies, then that's a > >> non-starter as far as I'm concerned. > >> > >> And what do those two settings even mean? I don't even know, and a user > >> sure as hell doesn't either. > >> > >> io_uring pins two types of pages - registered buffers, these are used > >> for actual IO, and the rings themselves. The rings are not used for IO, > >> just used to communicate between the application and the kernel. > > > > So, do we still want to solve file back pages write back if page in > > ubuffer are from a file ? > > That's not currently a concern for io_uring, as it disallows file backed > pages for the IO buffers that are being registered. > > > Also we can introduce a flag when registering buffer that allows to > > register buffer without pining and thus avoid the RLIMIT_MEMLOCK at > > the cost of possible latency spike. Then user registering the buffer > > knows what he gets. > > That may be fine for others users, but I don't think it'll apply > to io_uring. I can't see anyone selecting that flag, unless you're > doing something funky where you're registering a substantial amount > of the system memory for IO buffers. And I don't think that's going > to be a super valid use case... Given dataset are getting bigger and bigger i would assume that we will have people who want to use io-uring with large buffer. > > > Maybe it would be good to test, it might stay in the noise, then it > > might be a good thing to do. Also they are strategy to avoid latency > > spike for instance we can block/force skip mm invalidation if buffer > > has pending/running io in the ring ie only have buffer invalidation > > happens when there is no pending/running submission entry. > > Would that really work? The buffer could very well be idle right when > you check, but wanting to do IO the instant you decide you can do > background work on it. Additionally, that would require accounting > on when the buffers are inflight, which is exactly the kind of > overhead we're trying to avoid to begin with. > > > We can also pick what kind of invalidation we allow (compaction, > > migration, ...) and thus limit the scope and likelyhood of > > invalidation. > > I think it'd be useful to try and understand the use case first. > If we're pinning a small percentage of the system memory, do we > really care at all? Isn't it completely fine to just ignore? My main motivation is migration in NUMA system, if the process that did register buffer get migrated to a different node then it might actualy end up with bad performance because its io buffer are still on hold node. I am not sure we want to tell application developer to constantly monitor which node they are on and to re-register buffer after process migration to allow for memory migration. Cheers, Jérôme