Re: Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Oct 14, 2018 at 8:47 AM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
> On Fri, Oct 5, 2018 at 6:17 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> >
> > On Thu, Oct 4, 2018 at 11:35 PM Johannes Thumshirn <jthumshirn@xxxxxxx> wrote:
> > >
> > > On Thu, Oct 04, 2018 at 11:25:24PM -0700, Christoph Hellwig wrote:
> > > > Since when is an article on some website a promise (of what exactly)
> > > > by linux kernel developers?
> > >
> > > Let's stop it here, this doesn't make any sort of forward progress.
> > >
> >
> > I do think there is some progress we can make if we separate DAX as an
> > access mechanism vs DAX as a resource utilization contract. My attempt
> > at representing Christoph's position is that the kernel should not be
> > advertising / making access mechanism guarantees. That makes sense.
> > Even with MAP_SYNC+DAX the kernel reserves the right to write-protect
> > mappings at will and trap access into a kernel handler. Additionally,
> > whether read(2) / write(2) does anything different behind the scenes
> > in DAX mode, or not should be irrelevant to the application.
> >
> > That said what is certainly not irrelevant is a kernel giving
> > userspace visibility and control into resource utilization. Jan's
> > MADV_DIRECT_ACCESS let's the application make assumptions about page
> > cache utilization, we just need to another mechanism to read if a
> > mapping is effectively already in that state.
>
> I thought more about this today while reviewing the virtio-pmem driver
> that will behave mostly like a DAX-capable pmem device except it will
> be implemented by passing host page cache through to the guest as a
> pmem device with a paravirtualized / asynchronous flush interface.
> MAP_SYNC obviously needs to be disabled for this case, but still need
> allow to some semblance of DAX operation to save allocating page cache
> in the guest. The need to explicitly clarify the state of DAX is
> growing with the different nuances of DAX operation.
>
> Lets use a new MAP_DIRECT flag to positively assert that a given
> mmap() call is setting up a memory mapping without page-cache or
> buffered indirection. To be clear not my original MAP_DIRECT proposal
> from a while back, instead just a flag to mmap() that causes the
> mapping attempt to fail if there is any software buffering fronting
> the memory mapping, or any requirement for software to manage flushing
> outside of pushing writes through the cpu cache. This way, if we ever
> extend MAP_SYNC for a buffered use case we can still definitely assert
> that the mapping is "direct". So, MAP_DIRECT would fail for
> traditional non-DAX block devices, and for this new virtio-pmem case.
> It would also fail for any pmem device where we cannot assert that the
> platform will take care of flushing write-pending-queues on power-loss
> events.

After letting this set for a few days I think I'm back to liking
MADV_DIRECT_ACCESS more since madvise() is more closely related to the
page-cache management than mmap. It does not solve the query vs enable
problem, but it's still a step towards giving applications what they
want with respect to resource expectations.

Perhaps a new syscall to retrieve the effective advice for a range?

     int madvice(void *addr, size_t length, int *advice);




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux