On 1/22/20 10:28 AM, Jerome Glisse wrote: > On Wed, Jan 22, 2020 at 10:04:44AM -0700, Jens Axboe wrote: >> On 1/22/20 9:54 AM, Jerome Glisse wrote: >>> On Wed, Jan 22, 2020 at 08:12:51AM -0700, Jens Axboe wrote: >>>> On 1/22/20 4:59 AM, Michal Hocko wrote: >>>>> On Tue 21-01-20 20:57:23, Jerome Glisse wrote: >>>>>> We can also discuss what kind of knobs we want to expose so that >>>>>> people can decide to choose the tradeof themself (ie from i want low >>>>>> latency io-uring and i don't care wether mm can not do its business; to >>>>>> i want mm to never be impeded in its business and i accept the extra >>>>>> latency burst i might face in io operations). >>>>> >>>>> I do not think it is a good idea to make this configurable. How can >>>>> people sensibly choose between the two without deep understanding of >>>>> internals? >>>> >>>> Fully agree, we can't just punt this to a knob and call it good, that's >>>> a typical fallacy of core changes. And there is only one mode for >>>> io_uring, and that's consistent low latency. If this change introduces >>>> weird reclaim, compaction or migration latencies, then that's a >>>> non-starter as far as I'm concerned. >>>> >>>> And what do those two settings even mean? I don't even know, and a user >>>> sure as hell doesn't either. >>>> >>>> io_uring pins two types of pages - registered buffers, these are used >>>> for actual IO, and the rings themselves. The rings are not used for IO, >>>> just used to communicate between the application and the kernel. >>> >>> So, do we still want to solve file back pages write back if page in >>> ubuffer are from a file ? >> >> That's not currently a concern for io_uring, as it disallows file backed >> pages for the IO buffers that are being registered. >> >>> Also we can introduce a flag when registering buffer that allows to >>> register buffer without pining and thus avoid the RLIMIT_MEMLOCK at >>> the cost of possible latency spike. Then user registering the buffer >>> knows what he gets. >> >> That may be fine for others users, but I don't think it'll apply >> to io_uring. I can't see anyone selecting that flag, unless you're >> doing something funky where you're registering a substantial amount >> of the system memory for IO buffers. And I don't think that's going >> to be a super valid use case... > > Given dataset are getting bigger and bigger i would assume that we > will have people who want to use io-uring with large buffer. > >> >>> Maybe it would be good to test, it might stay in the noise, then it >>> might be a good thing to do. Also they are strategy to avoid latency >>> spike for instance we can block/force skip mm invalidation if buffer >>> has pending/running io in the ring ie only have buffer invalidation >>> happens when there is no pending/running submission entry. >> >> Would that really work? The buffer could very well be idle right when >> you check, but wanting to do IO the instant you decide you can do >> background work on it. Additionally, that would require accounting >> on when the buffers are inflight, which is exactly the kind of >> overhead we're trying to avoid to begin with. >> >>> We can also pick what kind of invalidation we allow (compaction, >>> migration, ...) and thus limit the scope and likelyhood of >>> invalidation. >> >> I think it'd be useful to try and understand the use case first. >> If we're pinning a small percentage of the system memory, do we >> really care at all? Isn't it completely fine to just ignore? > > My main motivation is migration in NUMA system, if the process that > did register buffer get migrated to a different node then it might > actualy end up with bad performance because its io buffer are still > on hold node. I am not sure we want to tell application developer to > constantly monitor which node they are on and to re-register buffer > after process migration to allow for memory migration. If the process truly cares, would it not have pinned itself to that node? -- Jens Axboe