From: Ira Weiny <ira.weiny@xxxxxxxxx> At LSF/MM'19 [1] [2] we discussed applications that overestimate memory consumption due to their inability to detect whether the kernel will instantiate page cache for a file, and cases where a global dax enable via a mount option is too coarse. The following patch series enables selecting the use of DAX on individual files and/or directories on xfs, and lays some groundwork to do so in ext4. In this scheme the dax mount option can be omitted to allow the per-file property to take effect. The insight at LSF/MM was to separate the per-mount or per-file "physical" capability switch from an "effective" attribute for the file. At LSF/MM we discussed the difficulties of switching the mode of a file with active mappings / page cache. It was thought the races could be avoided by limiting mode flips to 0-length files. However, this turns out to not be true.[3] This is because address space operations (a_ops) may be in use at any time the inode is referenced and users have expressed a desire to be able to change the mode on a file with data in it. For those reasons this patch set allows changing the mode flag on a file as long as it is not current mapped. Furthermore, DAX is a property of the inode and as such, many operations other than address space operations need to be protected during a mode change. Therefore callbacks are placed within the inode operations and used to lock the inode as appropriate. As in V1, Users are able to query the effective and physical flags separately at any time. Specifically the addition of the statx attribute bit allows them to ensure the file is operating in the mode they intend. This 'effective flag' and physical flags could differ when the filesystem is mounted with the dax flag for example. It should be noted that the physical DAX flag inheritance is not shown in this patch set as it was maintained from previous work on XFS. The physical DAX flag and it's inheritance will need to be added to other file systems for user control. Finally, extensive testing was performed which resulted in a couple of bug fix and clean up patches. Specifically: fs: remove unneeded IS_DAX() check fs/xfs: Fix truncate up 'Fix truncate up' deserves specific attention because I'm not 100% sure it is the correct fix. Without that patch fsx testing failed within a few minutes with this error. Mapped Write: non-zero data past EOF (0x3b0da) page offset 0xdb is 0x3711 With 'Fix truncate up' running fsx while changing modes can run for hours but I have seen 2 other errors in the same genre after many hours of continuous testing. They are: READ BAD DATA: offset = 0x22dc, size = 0xcc7e, fname = /mnt/pmem/dax-file Mapped Read: non-zero data past EOF (0x3309e) page offset 0x9f is 0x6ab4 After seeing the patches to fix stale data exposure problems[4] I'm more confident now that all 3 of these errors are a latent bug rather than a bug in this series itself. However, because of these failures I'm only submitting this set RFC. [1] https://lwn.net/Articles/787973/ [2] https://lwn.net/Articles/787233/ [3] https://lkml.org/lkml/2019/10/20/96 [4] https://patchwork.kernel.org/patch/11310511/ To: linux-kernel@xxxxxxxxxxxxxxx Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx> Cc: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> Cc: Dave Chinner <david@xxxxxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> Cc: "Theodore Y. Ts'o" <tytso@xxxxxxx> Cc: Jan Kara <jack@xxxxxxx> Cc: linux-ext4@xxxxxxxxxxxxxxx Cc: linux-xfs@xxxxxxxxxxxxxxx Cc: linux-fsdevel@xxxxxxxxxxxxxxx