On Fri, Feb 21, 2020 at 2:44 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Fri, Feb 21, 2020 at 06:44:49PM +0100, Christoph Hellwig wrote: > > On Thu, Feb 20, 2020 at 04:41:28PM -0800, ira.weiny@xxxxxxxxx wrote: > > > From: Ira Weiny <ira.weiny@xxxxxxxxx> > > > > > > DAX requires special address space operations (aops). Changing DAX > > > state therefore requires changing those aops. > > > > > > However, many functions require aops to remain consistent through a deep > > > call stack. > > > > > > Define a vfs level inode rwsem to protect aops throughout call stacks > > > which require them. > > > > > > Finally, define calls to be used in subsequent patches when aops usage > > > needs to be quiesced by the file system. > > > > I am very much against this. There is a reason why we don't support > > changes of ops vectors at runtime anywhere else, because it is > > horribly complicated and impossible to get right. IFF we ever want > > to change the DAX vs non-DAX mode (which I'm still not sold on) the > > right way is to just add a few simple conditionals and merge the > > aops, which is much easier to reason about, and less costly in overall > > overhead. > > *cough* > > That's exactly what the original code did. And it was broken > because page faults call multiple aops that are dependent on the > result of the previous aop calls setting up the state correctly for > the latter calls. And when S_DAX changes between those calls, shit > breaks. > > It's exactly the same problem as switching aops between two > dependent aops method calls - we don't solve anything by merging > aops and checking IS_DAX in each method because the race condition > is still there. > > /me throws his hands in the air and walks away Please come back, because I think it's also clear that the "we don't support changes of ops vectors at runtime" assertion is already being violated by ext4 [1]. So that leaves "IFF we ever want to change the dax vs non-dax mode" which I thought was already consensus given the lingering hopes about having some future where the kernel is able to dynamically optimize for dax vs non-dax based on memory media performance characteristics. I thought the only thing missing from the conclusion of the last conversation [2] was the "physical" vs "effective" split that we identified at LSF'19 [3]. Christoph, that split allows for for your concern about application intent to be considered / overridden by kernel policy, and it allows for communication of the effective state which applications need for resource planning [4] independent of MAP_SYNC and other dax semantics. The status quo of globally enabling dax for all files is empirically the wrong choice for page-cache friendly workloads running on slower-than-DRAM pmem media. I am struggling to see how we address the above items without first having a dynamic / less than global-filesystem scope facility to control dax. [1]: https://lore.kernel.org/linux-fsdevel/20191108131238.GK20863@xxxxxxxxxxxxxx [2]: https://lore.kernel.org/linux-fsdevel/20170927064001.GA27601@xxxxxxxxxxxxx [3]: https://lwn.net/Articles/787973/ [4]: https://lwn.net/Articles/787233/