Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Do not pin pages for various direct-io scheme

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 22, 2020 at 07:56:50AM -0800, Dan Williams wrote:
> On Tue, Jan 21, 2020 at 9:04 PM Jerome Glisse <jglisse@xxxxxxxxxx> wrote:
> >
> > On Tue, Jan 21, 2020 at 08:19:54PM -0800, Dan Williams wrote:
> > > On Tue, Jan 21, 2020 at 6:34 PM <jglisse@xxxxxxxxxx> wrote:
> > > >
> > > > From: Jérôme Glisse <jglisse@xxxxxxxxxx>
> > > >
> > > > Direct I/O does pin memory through GUP (get user page) this does
> > > > block several mm activities like:
> > > >     - compaction
> > > >     - numa
> > > >     - migration
> > > >     ...
> > > >
> > > > It is also troublesome if the pinned pages are actualy file back
> > > > pages that migth go under writeback. In which case the page can
> > > > not be write protected from direct-io point of view (see various
> > > > discussion about recent work on GUP [1]). This does happens for
> > > > instance if the virtual memory address use as buffer for read
> > > > operation is the outcome of an mmap of a regular file.
> > > >
> > > >
> > > > With direct-io or aio (asynchronous io) pages are pinned until
> > > > syscall completion (which depends on many factors: io size,
> > > > block device speed, ...). For io-uring pages can be pinned an
> > > > indifinite amount of time.
> > > >
> > > >
> > > > So i would like to convert direct io code (direct-io, aio and
> > > > io-uring) to obey mmu notifier and thus allow memory management
> > > > and writeback to work and behave like any other process memory.
> > > >
> > > > For direct-io and aio this mostly gives a way to wait on syscall
> > > > completion. For io-uring this means that buffer might need to be
> > > > re-validated (ie looking up pages again to get the new set of
> > > > pages for the buffer). Impact for io-uring is the delay needed
> > > > to lookup new pages or wait on writeback (if necessary). This
> > > > would only happens _if_ an invalidation event happens, which it-
> > > > self should only happen under memory preissure or for NUMA
> > > > activities.
> > >
> > > This seems to assume that memory pressure and NUMA migration are rare
> > > events. Some of the proposed hierarchical memory management schemes
> > > [1] might impact that assumption.
> > >
> > > [1]: http://lore.kernel.org/r/20191101075727.26683-1-ying.huang@xxxxxxxxx/
> > >
> >
> > Yes, it is true that it will likely becomes more and more an issues.
> > We are facing a tough choice here as pining block NUMA or any kind of
> > migration and thus might impede performance while invalidating an io-
> > uring buffer will also cause a small latency burst. I do not think we
> > can make everyone happy but at very least we should avoid pining and
> > provide knobs to let user decide what they care more about (ie io with-
> > out burst or better NUMA locality).
> 
> It's a question of tradeoffs and this proposal seems to have already
> decided that the question should be answered in favor a GPU/SVM
> centric view of the world without presenting the alternative.
> Direct-I/O colliding with GPU operations might also be solved by
> always triggering a migration, and applications that care would avoid
> colliding operations that slow down their GPU workload. A slow compat
> fallback that applications can programmatically avoid is more flexible
> than an upfront knob.

To make it clear i do not care about direct I/O colliding with anything
GPU or otherwise, anything like that is up to the application programmer.

My sole interest is with page pinning that block compaction and migration.
The former imped the kernel capability to materialize huge page, the
latter can impact performance badly including for the direct i/o user.
For instance if the process using io-uring get migrated to different node
after registering its buffer then it will keep using memory from a
different node which in the end might be much worse then the one time
extra latency spike the migration incur.

Cheers,
Jérôme





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux