On Tue, Oct 02, 2018 at 02:10:39PM +0200, Johannes Thumshirn wrote: > On Tue, Oct 02, 2018 at 12:05:31PM +0200, Jan Kara wrote: > > Hello, > > > > commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has > > removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the > > mean time certain customer of ours started poking into /proc/<pid>/smaps > > and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA > > flags, the application just fails to start complaining that DAX support is > > missing in the kernel. The question now is how do we go about this? > > OK naive question from me, how do we want an application to be able to > check if it is running on a DAX mapping? > > AFAIU DAX is always associated with a file descriptor of some kind (be > it a real file with filesystem dax or the /dev/dax device file for > device dax). So could a new fcntl() be of any help here? IS_DAX() only > checks for the S_DAX flag in inode::i_flags, so this should be doable > for both fsdax and devdax. > > I haven't tried it yet but it should be fairly easy to come up with > something like this. OK now I did on a normal file on BTFS (without DAX obviously) and on a file on XFS with the -o dax mount option. Here's the RFC: commit 3a8f0d23c421e8c91bc9d8bd3a956e1ffe3f754b Author: Johannes Thumshirn <jthumshirn@xxxxxxx> Date: Tue Oct 2 14:51:33 2018 +0200 fcntl: provide F_GETDAX for applications to query DAX capabilities Provide a F_GETDAX fcntl(2) command so an application can query whether it can make use of DAX or not. Both file-system DAX as well as device DAX mark the DAX capability in struct inode::i_flags using the S_DAX flag, so we can query it using the IS_DAX() macro on a struct file's inode. If the file descriptor is either device DAX or on a DAX capable file-system '1' is returned back to user-space, if DAX isn't usable for some reason '0' is returned back. This patch can be tested with the following small C program: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <libgen.h> #ifndef F_LINUX_SPECIFIC_BASE #define F_LINUX_SPECIFIC_BASE 1024 #endif #define F_GETDAX (F_LINUX_SPECIFIC_BASE + 15) int main(int argc, char **argv) { int dax; int fd; int rc; if (argc != 2) { printf("Usage: %s file\n", basename(argv[0])); exit(EXIT_FAILURE); } fd = open(argv[1], O_RDONLY); if (fd < 0) { perror("open"); exit(EXIT_FAILURE); } rc = fcntl(fd, F_GETDAX, &dax); if (rc < 0) { perror("fcntl"); close(fd); exit(EXIT_FAILURE); } if (dax) { printf("fd %d is dax capable\n", fd); exit(EXIT_FAILURE); } else { printf("fd %d is not dax capable\n", fd); exit(EXIT_SUCCESS); } } Signed-off-by: Johannes Thumshirn <jthumshirn@xxxxxxx> Cc: Jan Kara <jack@xxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxx> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> diff --git a/fs/fcntl.c b/fs/fcntl.c index 4137d96534a6..0b53f968f569 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -32,6 +32,22 @@ #define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME) +static int fcntl_get_dax(struct file *filp, unsigned long arg) +{ + struct inode *inode = file_inode(filp); + u64 *argp = (u64 __user *)arg; + u64 dax; + + if (IS_DAX(inode)) + dax = 1; + else + dax = 0; + + if (copy_to_user(argp, &dax, sizeof(*argp))) + return -EFAULT; + return 0; +} + static int setfl(int fd, struct file * filp, unsigned long arg) { struct inode * inode = file_inode(filp); @@ -426,6 +442,9 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, case F_SET_FILE_RW_HINT: err = fcntl_rw_hint(filp, cmd, arg); break; + case F_GETDAX: + err = fcntl_get_dax(filp, arg); + break; default: break; } diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 6448cdd9a350..65a59c3cc46d 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -52,6 +52,7 @@ #define F_SET_RW_HINT (F_LINUX_SPECIFIC_BASE + 12) #define F_GET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 13) #define F_SET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 14) +#define F_GETDAX (F_LINUX_SPECIFIC_BASE + 15) /* * Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be -- Johannes Thumshirn Storage jthumshirn@xxxxxxx +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850