Re: [RFC PATCH 0/7] dax, ext4: Synchronous page faults

Andy Lutomirski <luto@xxxxxxxxxx> · Thu, 27 Jul 2017 19:05:24 -0700

On Thu, Jul 27, 2017 at 2:57 PM, Ross Zwisler
<ross.zwisler@xxxxxxxxxxxxxxx> wrote:
> On Thu, Jul 27, 2017 at 10:09:07AM -0400, Jeff Moyer wrote:
>> Jan Kara <jack@xxxxxxx> writes:
>>
>> Hi, Jan,
>>
>> Thanks for looking into this!
>>
>> > There are couple of open questions with this implementation:
>> >
>> > 1) Is it worth the hassle?
>> > 2) Is S_SYNC good flag to use or should we use a new inode flag?
>> > 3) VM_FAULT_RO and especially passing of resulting 'pfn' from
>> >    dax_iomap_fault() through filesystem fault handler to dax_pfn_mkwrite() in
>> >    vmf->orig_pte is a bit of a hack. So far I'm not sure how to refactor
>> >    things to make this cleaner.
>>
>> 4) How does an application discover that it is safe to flush from
>>    userspace?
>
> I think that we would be best off with a new flag available via
> lsattr(1)/chattr(1).  This would have the following advantages:
>
> 1) We could only set the flag if the inode supported DAX (either via the mount
> option or via the individual DAX flag).  This would give NVML et al. one
> central way to detect whether it was safe to flush from userspace because the
> FS supported synchronous faults.
>
> 2) Defining a new flag prevents any confusion about whether the kernel version
> you have supports sync faults.  Otherwise NVML would have to do something like
> look at the trio of (kernel version, S_SYNC flag, mount/inode option for DAX)
> which is complex and of course breaks for OS kernel versions.
>
> 3) Defining the flag in a generic way via lsattr/chattr opens the door for the
> same API and flag to be used by other filesystems in the future.

I would advocate using a new fcntl() instead of lsattr for the
following reason: ISTM the fact that it's an *inode* flag in this
patchset is a bit of an implementation detail.  I can easily imagine a
future implementation that makes it per-struct-file instead.  A
fcntl() that asks "can I flush from userspace" would still work under
than scenario.