Re: [PATCH V4 07/13] fs: Add locking for a dynamic address space operations state

Dan Williams <dan.j.williams@xxxxxxxxx> · Fri, 21 Feb 2020 15:26:45 -0800

On Fri, Feb 21, 2020 at 2:44 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Fri, Feb 21, 2020 at 06:44:49PM +0100, Christoph Hellwig wrote:
> > On Thu, Feb 20, 2020 at 04:41:28PM -0800, ira.weiny@xxxxxxxxx wrote:
> > > From: Ira Weiny <ira.weiny@xxxxxxxxx>
> > >
> > > DAX requires special address space operations (aops).  Changing DAX
> > > state therefore requires changing those aops.
> > >
> > > However, many functions require aops to remain consistent through a deep
> > > call stack.
> > >
> > > Define a vfs level inode rwsem to protect aops throughout call stacks
> > > which require them.
> > >
> > > Finally, define calls to be used in subsequent patches when aops usage
> > > needs to be quiesced by the file system.
> >
> > I am very much against this.  There is a reason why we don't support
> > changes of ops vectors at runtime anywhere else, because it is
> > horribly complicated and impossible to get right.  IFF we ever want
> > to change the DAX vs non-DAX mode (which I'm still not sold on) the
> > right way is to just add a few simple conditionals and merge the
> > aops, which is much easier to reason about, and less costly in overall
> > overhead.
>
> *cough*
>
> That's exactly what the original code did. And it was broken
> because page faults call multiple aops that are dependent on the
> result of the previous aop calls setting up the state correctly for
> the latter calls. And when S_DAX changes between those calls, shit
> breaks.
>
> It's exactly the same problem as switching aops between two
> dependent aops method calls - we don't solve anything by merging
> aops and checking IS_DAX in each method because the race condition
> is still there.
>
> /me throws his hands in the air and walks away

Please come back, because I think it's also clear that the "we don't
support changes of ops vectors at runtime" assertion is already being
violated by ext4 [1]. So that leaves "IFF we ever want to change the
dax vs non-dax mode" which I thought was already consensus given the
lingering hopes about having some future where the kernel is able to
dynamically optimize for dax vs non-dax based on memory media
performance characteristics. I thought the only thing missing from the
conclusion of the last conversation [2] was the "physical" vs
"effective" split that we identified at LSF'19 [3]. Christoph, that
split allows for for your concern about application intent to be
considered / overridden by kernel policy, and it allows for
communication of the effective state which applications need for
resource planning [4] independent of MAP_SYNC and other dax semantics.

The status quo of globally enabling dax for all files is empirically
the wrong choice for page-cache friendly workloads running on
slower-than-DRAM pmem media.

I am struggling to see how we address the above items without first
having a dynamic / less than global-filesystem scope facility to
control dax.

[1]: https://lore.kernel.org/linux-fsdevel/20191108131238.GK20863@xxxxxxxxxxxxxx
[2]: https://lore.kernel.org/linux-fsdevel/20170927064001.GA27601@xxxxxxxxxxxxx
[3]: https://lwn.net/Articles/787973/
[4]: https://lwn.net/Articles/787233/