On Sun, Oct 14, 2018 at 8:47 AM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > On Fri, Oct 5, 2018 at 6:17 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > > On Thu, Oct 4, 2018 at 11:35 PM Johannes Thumshirn <jthumshirn@xxxxxxx> wrote: > > > > > > On Thu, Oct 04, 2018 at 11:25:24PM -0700, Christoph Hellwig wrote: > > > > Since when is an article on some website a promise (of what exactly) > > > > by linux kernel developers? > > > > > > Let's stop it here, this doesn't make any sort of forward progress. > > > > > > > I do think there is some progress we can make if we separate DAX as an > > access mechanism vs DAX as a resource utilization contract. My attempt > > at representing Christoph's position is that the kernel should not be > > advertising / making access mechanism guarantees. That makes sense. > > Even with MAP_SYNC+DAX the kernel reserves the right to write-protect > > mappings at will and trap access into a kernel handler. Additionally, > > whether read(2) / write(2) does anything different behind the scenes > > in DAX mode, or not should be irrelevant to the application. > > > > That said what is certainly not irrelevant is a kernel giving > > userspace visibility and control into resource utilization. Jan's > > MADV_DIRECT_ACCESS let's the application make assumptions about page > > cache utilization, we just need to another mechanism to read if a > > mapping is effectively already in that state. > > I thought more about this today while reviewing the virtio-pmem driver > that will behave mostly like a DAX-capable pmem device except it will > be implemented by passing host page cache through to the guest as a > pmem device with a paravirtualized / asynchronous flush interface. > MAP_SYNC obviously needs to be disabled for this case, but still need > allow to some semblance of DAX operation to save allocating page cache > in the guest. The need to explicitly clarify the state of DAX is > growing with the different nuances of DAX operation. > > Lets use a new MAP_DIRECT flag to positively assert that a given > mmap() call is setting up a memory mapping without page-cache or > buffered indirection. To be clear not my original MAP_DIRECT proposal > from a while back, instead just a flag to mmap() that causes the > mapping attempt to fail if there is any software buffering fronting > the memory mapping, or any requirement for software to manage flushing > outside of pushing writes through the cpu cache. This way, if we ever > extend MAP_SYNC for a buffered use case we can still definitely assert > that the mapping is "direct". So, MAP_DIRECT would fail for > traditional non-DAX block devices, and for this new virtio-pmem case. > It would also fail for any pmem device where we cannot assert that the > platform will take care of flushing write-pending-queues on power-loss > events. After letting this set for a few days I think I'm back to liking MADV_DIRECT_ACCESS more since madvise() is more closely related to the page-cache management than mmap. It does not solve the query vs enable problem, but it's still a step towards giving applications what they want with respect to resource expectations. Perhaps a new syscall to retrieve the effective advice for a range? int madvice(void *addr, size_t length, int *advice);